ADR for removing MongoDB from edx-platform

Overdue ADR PR:

I’ll keep it open for a couple of weeks to give folks time to comment.

Thank you.

7 Likes

This has been merged.

PRs related to this effort:

I’ll add more PRs to this thread as they come, for folks who are interested in following this. Thank you.

The second of three PRs needed to switch the handling of active versions to MySQL has merged:

The previous PR had to be reverted again, and now Take 3 of the data migration for course index data has been merged:

Merging soon on edx-platform is the final cutover of course “active version” index data from MySQL:

Much love and appreciation to @braden for doggedly pushing through this effort. :saluting_face:

5 Likes

@dave @braden thanks a lot for this huge effort. This topic was brought up in the last BTR meeting and I would like to ask you if after these changes MongoDB is no longer needed to run edx-platform, or if there are still some dependencies to resolve like the Course assets storage.

It’s a step down a long road. MongoDB will definitely be needed for courseware in Nutmeg.

To give a little more context, there are a few independent pieces of deprecation that need to happen before MongoDB can be removed:

Old Mongo Removal

Remove Old Mongo support entirely. @Michael_Terry has done great work recently in cutting off access to Old Mongo courses:

What remains is a lot of code deletion and test fixing. This is an area where people can contribute with relatively little ramp-up, since it’s mostly deleting test permutations. Please comment here if you’re interested in that work.

Convert the Split Modulestore to use django-storages

This will require three parallelizable streams of work:

  1. Converting how we store Structure documents (the skeleton outline of a course) from MongoDB to django-storages, and migration scripts. This may also require some porting of the structures.py cleanup script, depending on the projected costs.
  2. Converting how we store Definition documents (the content of each block) from MongoDB to django-storages, and migration scripts. Because of access patterns and the variability of latency with object stores like S3, this would likely require an improved caching layer as well.
  3. Converting how we store store course static assets (e.g. images, PDFs) to django-storages. We have to take care here to (a) not break CDN caching for certain assets; and (b) not break security restrictions for enrollment-restricted assets. The way we ship over static assets right now is implemented as middleware and is frankly kinda wacky, so there will likely be more cleanup here than appears at first glance.

Remove MongoDB usage from Forums

Once the two sections above this are finished, it’s possible to have a basic install of Open edX without MongoDB. The last piece that I know of that actively uses MongoDB is the forums experience. I don’t know what the current plans for deprecation of this usage is. The last I recall talking with anyone about it, the general idea was that we wanted to switch away from MongoDB and towards the Django ORM, but only after removing the Ruby code. But again, I’m not sure where that stands now.

At the very least, if the other sections are completed, MongoDB can be a dependency of only the forums, and not Open edX as a whole.


You can follow this overall roadmap item here:

2 Likes

@dave

Why are we removing MongoDB as a dependency for the platform?

Is there going to be directions or a script to move the existing courses from MongoDB to MySQL?

cc @becdavid @traek728

The reasons for removing MongoDB were outlined in the ADR mentioned at the beginning of this thread, which has now been merged at edx-platform/0002-remove-mongodb-dependency.rst at master · openedx/edx-platform · GitHub . I’m pretty sure there will be some code to assist with migrations to the new course data storage system.

2 Likes