dave
(Dave Ormsbee)
October 24, 2021, 8:54pm
1
Overdue ADR PR:
edx:master
← edx:ormsbee/no-mongo-adr
opened 08:52PM - 24 Oct 21 UTC
ADR for removing MongoDB as a dependency of edx-platform.
I’ll keep it open for a couple of weeks to give folks time to comment.
Thank you.
7 Likes
dave
(Dave Ormsbee)
November 12, 2021, 12:38am
3
PRs related to this effort:
edx:master
← open-craft:split-modulestore-mysql-take2
opened 12:21AM - 19 Oct 21 UTC
## Description
This is a revised version of #27565 with some minor changes, a… s the prior attempt had to be reverted (#28979).
## What has changed
There are two changes since the first version of this PR:
1. The data migration has been removed for now. Instead, this PR will immediately start writing to MySQL+MongoDB but continue to read from MongoDB only. This way no writes will be lost. A future PR will migrate the data for any courses that don't yet appear in MySQL, and a third PR will switch reads to MySQL. [See discussion](https://github.com/edx/edx-platform/pull/28979#issuecomment-942429562)
See a876bde5e729e0868f16a34c0c1a63f8bf3a2db9 for this change.
2. The `course_id` column is now case-sensitive, for compatibility with MongoDB. Although the platform code generally [tries to prevent having two courses whose IDs differ only by case](https://github.com/edx/edx-platform/blob/ac8b4f5a6dfdc4ccc6433aeae60d42f6aa341207/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py#L1833-L1838) (note `ignore_case=True`), we found at least one pair of courses on stage that differs only by case in its `org` ID (`edx` vs `edX`). So for backwards compatibility with MongoDB and to avoid issues for anyone else with similar course IDs that differ only by case, we've made the new version case sensitive too. The system still tries to prevent creation of courses that differ only by case (that hasn't changed), but now the MySQL version won't break if that has somehow happened. [See discussion](https://github.com/edx/edx-platform/pull/28979#issuecomment-938001847)
See 9e03d0f5179af7ce409cefe14983154feeecda43 for this change.
## Testing instructions
Check out this PR, run the migrations, then verify that courses are still listed and accessible in Studio (despite no data migration).
To test the case sensitivity issue, run this PR's migration then open an LMS django shell (`./manage.py lms shell`) and run the commands in [this gist](https://gist.github.com/bradenmacdonald/d280ae7124257f06f70dd9ce08178fd8). If you do that on the old version, it will give errors, but on the new version (with the modified migration), it works. A unit test for this has been added which checks the same thing.
edx:master
← open-craft:braden/course-indexes-mysql-4-data-migration-take2
opened 03:25AM - 10 Nov 21 UTC
## Description
This PR is a repeat of https://github.com/edx/edx-platform/pul… l/29144 which [had to be reverted](https://github.com/edx/edx-platform/pull/29212).
This version has similar code but when encountering unexpected data, it will just log it and proceed rather than throwing an error. [See this discussion](https://github.com/edx/edx-platform/pull/29212#issuecomment-959780731).
If the error occurs again, the log output will look like this:
```
2021-11-10 03:19:06,653 ERROR 449 [common.djangoapps.split_modulestore_django.migrations.0002_data_migration] [user None] [ip None] 0002_data_migration.py:36 - Possible data issue found during data migration of course indexes from MongoDB to MySQL:
Course course-v1:edX+DemoX+Demo_Course already exists in MySQL but the MongoDB version is newer. That's unexpected because since the course index table was added to MySQL, there has never been a time when we would write course_indexes updates only to MongoDB without also writing to MySQL.
Mongo data: edited_on: 2021-11-03 22:23:48.989000+00:00, last_update: 2021-11-03 22:24:14.439000+00:00, published_version: 61830c0dca80abc00d1b233f
MySQL data: edited_on: 2021-10-01 11:38:42.219000+00:00, last_update: 2021-10-01 11:38:42.219000+00:00, published_version: bc00d1b233f61830c0dca80a
The MySQL version will be overwritten and the MongoDB version used.
```
## Supporting information
See previous PRs.
## Testing instructions
1. Check out a master devstack. Make some changes to a course or two (maybe a library too).
1. At http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ verify that any courses that you've modified are already copied into MySQL. Keep this tab open.
1. Check out this PR and run LMS migrations
1. Open http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ in a new tab, and compare to the prior version. Courses that were already present in MySQL should be unchanged, and all remaining courses from your devstack should now be listed there.
## Deadline
None
I’ll add more PRs to this thread as they come, for folks who are interested in following this. Thank you.
dave
(Dave Ormsbee)
November 22, 2021, 3:57pm
4
The second of three PRs needed to switch the handling of active versions to MySQL has merged:
edx:master
← open-craft:braden/course-indexes-mysql-4-data-migration-take2
opened 03:25AM - 10 Nov 21 UTC
## Description
This PR is a repeat of https://github.com/edx/edx-platform/pul… l/29144 which [had to be reverted](https://github.com/edx/edx-platform/pull/29212).
This version has similar code but when encountering unexpected data, it will just log it and proceed rather than throwing an error. [See this discussion](https://github.com/edx/edx-platform/pull/29212#issuecomment-959780731).
If the error occurs again, the log output will look like this:
```
2021-11-10 03:19:06,653 ERROR 449 [common.djangoapps.split_modulestore_django.migrations.0002_data_migration] [user None] [ip None] 0002_data_migration.py:36 - Possible data issue found during data migration of course indexes from MongoDB to MySQL:
Course course-v1:edX+DemoX+Demo_Course already exists in MySQL but the MongoDB version is newer. That's unexpected because since the course index table was added to MySQL, there has never been a time when we would write course_indexes updates only to MongoDB without also writing to MySQL.
Mongo data: edited_on: 2021-11-03 22:23:48.989000+00:00, last_update: 2021-11-03 22:24:14.439000+00:00, published_version: 61830c0dca80abc00d1b233f
MySQL data: edited_on: 2021-10-01 11:38:42.219000+00:00, last_update: 2021-10-01 11:38:42.219000+00:00, published_version: bc00d1b233f61830c0dca80a
The MySQL version will be overwritten and the MongoDB version used.
```
## Supporting information
See previous PRs.
## Testing instructions
1. Check out a master devstack. Make some changes to a course or two (maybe a library too).
1. At http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ verify that any courses that you've modified are already copied into MySQL. Keep this tab open.
1. Check out this PR and run LMS migrations
1. Open http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ in a new tab, and compare to the prior version. Courses that were already present in MySQL should be unchanged, and all remaining courses from your devstack should now be listed there.
## Deadline
None
dave
(Dave Ormsbee)
November 29, 2021, 4:31pm
5
The previous PR had to be reverted again, and now Take 3 of the data migration for course index data has been merged:
edx:master
← open-craft:braden/course-indexes-mysql-4-data-migration-take3
opened 08:35PM - 23 Nov 21 UTC
## Description
This PR is a repeat of https://github.com/edx/edx-platform/pul… l/29144 and #29293 which both had to be reverted.
This version is identical to the previous version but fixes a bug: when trying to log details about a course with data that needs to be investigated, this version will no longer cause an exception if the course has no `published_version` (the course was never published). The bugfix is 4cc8a6b8bf510fd6c957ac16d6946c1e6c51f22c.
When this migration runs, it is expected to produce some output like the following, which we'll need to investigate afterward:
```
2021-11-10 03:19:06,653 ERROR 449 [common.djangoapps.split_modulestore_django.migrations.0002_data_migration] [user None] [ip None] 0002_data_migration.py:36 - Possible data issue found during data migration of course indexes from MongoDB to MySQL:
Course course-v1:edX+DemoX+Demo_Course already exists in MySQL but the MongoDB version is newer. That's unexpected because since the course index table was added to MySQL, there has never been a time when we would write course_indexes updates only to MongoDB without also writing to MySQL.
Mongo data: edited_on: 2021-11-03 22:23:48.989000+00:00, last_update: 2021-11-03 22:24:14.439000+00:00, published_version: 61830c0dca80abc00d1b233f
MySQL data: edited_on: 2021-10-01 11:38:42.219000+00:00, last_update: 2021-10-01 11:38:42.219000+00:00, published_version: bc00d1b233f61830c0dca80a
The MySQL version will be overwritten and the MongoDB version used.
```
## Supporting information
See previous PRs.
## Testing instructions
1. Check out a master devstack. Make some changes to a course or two (maybe a library too).
1. At http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ verify that any courses that you've modified are already copied into MySQL. Keep this tab open.
1. Check out this PR and run LMS migrations
1. Open http://localhost:18000/admin/split_modulestore_django/splitmodulestorecourseindex/ in a new tab, and compare to the prior version. Courses that were already present in MySQL should be unchanged, and all remaining courses from your devstack should now be listed there.
## Deadline
None
dave
(Dave Ormsbee)
March 8, 2022, 2:43pm
6
Merging soon on edx-platform is the final cutover of course “active version” index data from MySQL:
openedx:master
← open-craft:braden/course-indexes-mysql-5-read-cutover
opened 07:03PM - 01 Nov 21 UTC
## Description
This is a follow up to #29058 and #29413. This is the next ste… p in moving part of the modulestore data (the course indexes / "active versions" table) from MongoDB to MySQL.
There are four steps planned in moving course index data to MySQL:
1. Step 1: create the tables in MySQL, start writing to MySQL + MongoDB ✅ [done](https://github.com/edx/edx-platform/pull/29058)
1. Step 2: migrate all remaining courses to MySQL ✅ [done](https://github.com/edx/edx-platform/pull/29413)
1. Step 3: switch reads from MongoDB to MySQL (**this PR**)
1. Step 4 (much later, once we know this is working well): stop writing to MongoDB altogether.
## Supporting information
OpenCraft Jira ticket: MNG-2557
## Status
Testing with a large Open edX instance is in progress.
## Testing instructions
Try making changes in Studio and verify that they work fine.
## Deadline
None
Much love and appreciation to @braden for doggedly pushing through this effort.
5 Likes
@dave @braden thanks a lot for this huge effort. This topic was brought up in the last BTR meeting and I would like to ask you if after these changes MongoDB is no longer needed to run edx-platform, or if there are still some dependencies to resolve like the Course assets storage.
dave
(Dave Ormsbee)
March 14, 2022, 5:10pm
8
It’s a step down a long road. MongoDB will definitely be needed for courseware in Nutmeg.
dave
(Dave Ormsbee)
March 15, 2022, 1:41pm
9
To give a little more context, there are a few independent pieces of deprecation that need to happen before MongoDB can be removed:
Old Mongo Removal
Remove Old Mongo support entirely. @Michael_Terry has done great work recently in cutting off access to Old Mongo courses:
What remains is a lot of code deletion and test fixing. This is an area where people can contribute with relatively little ramp-up, since it’s mostly deleting test permutations. Please comment here if you’re interested in that work.
Convert the Split Modulestore to use django-storages
This will require three parallelizable streams of work:
Converting how we store Structure documents (the skeleton outline of a course) from MongoDB to django-storages, and migration scripts. This may also require some porting of the structures.py cleanup script , depending on the projected costs.
Converting how we store Definition documents (the content of each block) from MongoDB to django-storages, and migration scripts. Because of access patterns and the variability of latency with object stores like S3, this would likely require an improved caching layer as well.
Converting how we store store course static assets (e.g. images, PDFs) to django-storages. We have to take care here to (a) not break CDN caching for certain assets; and (b) not break security restrictions for enrollment-restricted assets. The way we ship over static assets right now is implemented as middleware and is frankly kinda wacky, so there will likely be more cleanup here than appears at first glance.
Remove MongoDB usage from Forums
Once the two sections above this are finished, it’s possible to have a basic install of Open edX without MongoDB. The last piece that I know of that actively uses MongoDB is the forums experience. I don’t know what the current plans for deprecation of this usage is. The last I recall talking with anyone about it, the general idea was that we wanted to switch away from MongoDB and towards the Django ORM, but only after removing the Ruby code. But again, I’m not sure where that stands now.
At the very least, if the other sections are completed, MongoDB can be a dependency of only the forums, and not Open edX as a whole.
You can follow this overall roadmap item here:
opened 02:50PM - 26 Oct 21 UTC
redwood
developer
TLDR;
Mongo DB introduces excessive hosting and maintenance cost for the valu… e that we derive from in on the platform. With improvements to serializing courses in the RDBMS, it's even less valuable. Removing it will let us reduce cost and maintenance burden, scale down better, and thereby scale platform adoption.
The full details of this are [here](https://github.com/edx/edx-platform/blob/master/common/lib/xmodule/xmodule/docs/decisions/0002-remove-mongodb-dependency.rst).
Related PRs:
- https://github.com/edx/edx-platform/pull/29098
- https://github.com/edx/edx-platform/pull/27565
- https://github.com/edx/edx-platform/pull/29058
- https://github.com/edx/edx-platform/pull/29144
- https://github.com/edx/edx-platform/pull/29293
2 Likes
@dave
Why are we removing MongoDB as a dependency for the platform?
Is there going to be directions or a script to move the existing courses from MongoDB to MySQL?
cc @becdavid @traek728
jmbowman
(Jeremy Bowman)
October 27, 2022, 2:04pm
11
The reasons for removing MongoDB were outlined in the ADR mentioned at the beginning of this thread, which has now been merged at edx-platform/0002-remove-mongodb-dependency.rst at master · openedx/edx-platform · GitHub . I’m pretty sure there will be some code to assist with migrations to the new course data storage system.
2 Likes