Programmatically create courses including all xblock children

I am working on a data migration script to move all our data from our old LMS to our new Open edX platform. I’m sure many questions will come up along the way, so thank you in advance for your support and guidance.

We have a few layers to this migration project which we are approaching in phases:

  1. Rebuild all of our courses in Open edX
  2. Backfill user completion progress
  3. Backfill user performance on problems

At the moment I am primarily focused on phase 1, but I have to choose an algorithm that puts me in position to seamlessly transition to the next steps when I get there. That means that I am writing a python script that creates new courses in Open edX by pulling the definition of each course from our old LMS and translating it into the Open edX structure. Along the way, I am recording the mapping between legacy identifiers and their Open edX counterparts for each element in the course hierarchy (course, section, subsection, unit, problem). This will allow me to first QA and confirm that the results match the original content, and later backfill user progress by translating completed id’s from the legacy system into their corresponding completions in the new system.

Thus far I have successfully programmatically created the shell course hierarchy (course, section, subsection and unit) using python objects and functions such as cms.djangoapps.contentstore.views.course.create_new_course and cms.djangoapps.contentstore.views.helpers.create_xblock. Each time a new object is created, I store the mapping between the old id and new id in a custom table I have defined. The algorithm has been pretty straightforward until this point.

However, now I am up to the point of filling in the unit content by adding different xblock instances with their appropriate metadata. The create_xblock function doesn’t appear to support specifying field data at time of creation, so follow up calls are necessary to populate and save the xblocks which then generates new course versions and gets more nuanced and tricky to make sure all steps are done in the right order.

Is there an advised way to generate these xblock instances programmatically? Is there a better function to use rather than create_xblock? Are there a clear list of steps that need to be followed in order to make sure that the course is successfully updated and remains consistent?

I have considered the alternate approach of generating the xml definition of a course and programmatically importing the xml in order to leverage that powerful tool. However, as of yet I have not identified a way to maintain the mappings between legacy ids and new ids at each level of the course hierarchy while using these import tools. If building xml is the recommended approach, is there a way to achieve my requirements?

Are there other tools I am missing? Any other general advice on how to approach this complex data migration?

Thank you again!

1 Like

I think most folks do this via XML, as cc2olx does. I’m not sure what the best practices are there. @colin.fredericks or @pdpinch might have more insight, as both Harvard and MIT have built utilities to generate and manipulate XML for Open edX courses.

You can use the modulestore create_item method directly to create XBlocks programmatically. It allows specifying not only fields but also the last part of the ID, if you want to generate a specific ID (that’s easier to map to the IDs in the legacy system), rather than generating a new random ID. You can perhaps even include the legacy ID directly - see the example in the next part of my answer.

If you do the import via XML (which is probably most common, as Dave mentioned), the ID of each XBlock is given by the file name. If your old IDs are alphanumeric, you can use the old ID as the filename. So if you use legacy_id.xml as the filename that contains the data for a given XBlock, then when you import it the new ID will be something like block-v1:org+course+run+type@html+block@legacy_id.

One thing to be aware of is that in Open edX the block type is part of the ID, so in theory block-v1:...+type@html+block@1234 and block-v1:...+type@problem+block@1234 are two different blocks with different IDs, because they are different types, even though the 1234 ID part is the same.

2 Likes

Speak my name and I shall appear.

Yes! We do a fair amount of work with XML directly. For instance, here’s a web page that spits out course outlines in XML. It’s written entirely in client-side javascript, so it doesn’t send us anything you put into it. Feel free to use it if you like. The code is MIT-licensed.

I also have a bunch of Python scripts. Most of them are more oriented toward batch modification of courses rather than creation, but I do have one that I built for the purpose of moving HBSO courses over to edX on the occasions we do that. That’s on a private repo, but I can check with my higher-ups and I suspect they’ll be happy with me sharing it. All of it is done “from scratch” with lxml and BeautifulSoup4.

1 Like

Thank you all for your responses! Switching to use the modulestore create_child function did the trick and I now have a working course content migration solution (still completing all the details, but the approach works). Thanks @braden!!

@colin.fredericks I imagine my work is going to overlap with the tools you shared and your experience, so I’ll let you know if I circle back around in the near future.

1 Like

@Jeff_Cohen would you consider writing a How-To doc or a blog post about the migration you’re doing, I bet there are people who would be interested in the details once you get through it.

1 Like