I am working on a data migration script to move all our data from our old LMS to our new Open edX platform. I’m sure many questions will come up along the way, so thank you in advance for your support and guidance.
We have a few layers to this migration project which we are approaching in phases:
- Rebuild all of our courses in Open edX
- Backfill user completion progress
- Backfill user performance on problems
At the moment I am primarily focused on phase 1, but I have to choose an algorithm that puts me in position to seamlessly transition to the next steps when I get there. That means that I am writing a python script that creates new courses in Open edX by pulling the definition of each course from our old LMS and translating it into the Open edX structure. Along the way, I am recording the mapping between legacy identifiers and their Open edX counterparts for each element in the course hierarchy (course, section, subsection, unit, problem). This will allow me to first QA and confirm that the results match the original content, and later backfill user progress by translating completed id’s from the legacy system into their corresponding completions in the new system.
Thus far I have successfully programmatically created the shell course hierarchy (course, section, subsection and unit) using python objects and functions such as cms.djangoapps.contentstore.views.course.create_new_course
and cms.djangoapps.contentstore.views.helpers.create_xblock
. Each time a new object is created, I store the mapping between the old id and new id in a custom table I have defined. The algorithm has been pretty straightforward until this point.
However, now I am up to the point of filling in the unit content by adding different xblock instances with their appropriate metadata. The create_xblock
function doesn’t appear to support specifying field data at time of creation, so follow up calls are necessary to populate and save the xblocks which then generates new course versions and gets more nuanced and tricky to make sure all steps are done in the right order.
Is there an advised way to generate these xblock instances programmatically? Is there a better function to use rather than create_xblock
? Are there a clear list of steps that need to be followed in order to make sure that the course is successfully updated and remains consistent?
I have considered the alternate approach of generating the xml definition of a course and programmatically importing the xml in order to leverage that powerful tool. However, as of yet I have not identified a way to maintain the mappings between legacy ids and new ids at each level of the course hierarchy while using these import tools. If building xml is the recommended approach, is there a way to achieve my requirements?
Are there other tools I am missing? Any other general advice on how to approach this complex data migration?
Thank you again!