500 errors accessing an older course in our LMS, modulestore values out of sync?

(originally taken from the Open edX Slack instance)

I am curious if someone could help me unravel a mystery around 500 errors we see occasionally from the LMS in the course_modes app’s views.py.

Our system is failing to retrieve a course run’s info from Mongo but I’m a little stumped as to why.
Link to offending piece of code: https://github.com/openedx/edx-platform/blob/05890d25b69ab618fb166964c53f4ed3d70e8a77/common/djangoapps/course_modes/views.py#L362-L378

Error observed:

builtins.AttributeError: ‘NoneType’ object has no attribute ‘has_started’
… {StackTrace} …
File “/edx/app/edxapp/edx-platform/common/djangoapps/course_modes/views.py”, line 142, in get
return self._redirect_to_course_or_dashboard(course, course_key, request.user)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/edx/app/edxapp/edx-platform/common/djangoapps/course_modes/views.py”, line 415, in _redirect_to_course_or_dashboard
if course.has_started() or user.is_staff:
^^^^^^^^^^^^^^^^^^
AttributeError: ‘NoneType’ object has no attribute ‘has_started’

So it seems like a null course has been passed to the _redirect_to_course_or_dashboard function… but I’m trying to understand why it’s null. We attempt to retrieve the course run here: edx-platform/common/djangoapps/course_modes/views.py at 05890d25b69ab618fb166964c53f4ed3d70e8a77 · openedx/edx-platform · GitHub

Querying Mongo directly, I’m able to retrieve information about this course run… (some data scrubbed for sharing purposes)
db.modulestore.active_versions.find({"course": "some-course", "org": "some-org", "run": "1T2022"}).pretty()

{
“_id” : ObjectId(“621727931f9260e1e9a873ba”),
“org” : “some-org”,
“course” : “some-course”,
“run” : “1T2022”,
“edited_by” : 10442141,
“edited_on” : ISODate(“2022-02-24T06:37:07.373Z”),
“last_update” : ISODate(“2022-03-11T05:08:57.515Z”),
“versions” : {
“draft-branch” : ObjectId(“622ad9695eaf4ae29c569336”),
“published-branch” : ObjectId(“622ad9695eaf4ae29c569337”)
},
“schema_version” : 1,
“search_targets” : {
“wiki_slug” : “some-course”
}
}

I was also able to successfully retrieve course info via Mongo using the draft-branch and published-branch ObjectIDs.

Digging a bit further, I see another error in our logs related to my 500 error:

Traceback (most recent call last):
File “/edx/app/edxapp/venvs/edxapp/lib/python3.11/site-packages/edx_django_utils/monitoring/internal/utils.py”, line 112, in function_trace
yield
File “/usr/lib/python3.11/contextlib.py”, line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File “/edx/app/edxapp/edx-platform/xmodule/modulestore/split_mongo/split_draft.py”, line 61, in get_course
return super().get_course(course_id, depth=depth, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/edx/app/edxapp/edx-platform/xmodule/modulestore/split_mongo/split.py”, line 1106, in get_course
return self._get_structure(course_id, depth, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/edx/app/edxapp/edx-platform/xmodule/modulestore/split_mongo/split.py”, line 1094, in _get_structure
structure_entry = self._lookup_course(structure_id, head_validation=head_validation)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/edx/app/edxapp/edx-platform/xmodule/modulestore/split_mongo/split.py”, line 860, in _lookup_course
raise ItemNotFoundError(f’Structure: {version_guid}')
xmodule.modulestore.exceptions.ItemNotFoundError: Structure: 623043c73b501fd206fdc2d6

Are we trying to retrieve the course run’s info with an ObjectId of 623043c73b501fd206fdc2d6? Shouldn’t this match one of the versions of the course stored in Mongo (assuming the published-branch version?)

Thanks to advice from @dave, I then looked at the split_modulestore_django_splitmodulestorecourseindex table and discovered that the draft_version and published_version pointers for the course run seem to be out of sync:

SELECT course_id, objectid, draft_version, published_version FROM split_modulestore_django_splitmodulestorecourseindex where course_id = 'course-v1:some-org+some-course+1T2022';

Which resulted in the following values being returned:
objectid: 621727931f9260e1e9a873ba
draft_version: 623043c2949df6cd1bc8febe
published_version: 623043c73b501fd206fdc2d6.

Has this happened to anyone else? Any words of advice? Could I manually update the pointers in the split_modulestore_django_splitmodulestorecourseindex table to the current values in Mongo and “fix” the course? Any gotchas here?

Thanks!

So the big concern here is that if things get out of sync, it’s possible that the structures pruning script can remove a live state of content (which might be what happened here). So something like this:

  • A new version (structure doc) is written. The write succeeds for MySQL but not MongoDB.
  • The pruning script looks at MongoDB. It doesn’t see the new version and thus assumes its an orphan that should be removed.
  • Now the MySQL side is referencing a structure doc that no longer exists.

This can be even more mysterious because I believe it’s possible for a structure doc to live in the cache for a long period of time, even if it’s been deleted in MongoDB. So even if the doc doesn’t exist in MongoDB, you can sometimes load it in Studio because it will read off the cache, and then publish it to force a real write. I’ve only seen things get into this state one time however, and my understanding was that the bug in question was fixed years ago.

One thing to consider is addressing the root of the problem:

If we can update the pruning script, we should be able to kill this double-write across databases. At the time this was written, it would have been harder to implement this because the code for pruning was in the tubular repo. But since that time, it’s been moved into edx-platform. Very specifically, it’s this:

So I think you can:

  1. Modify the structures.py script to run as a Django standalone script, i.e. use django.setup().
  2. Either modify or subclass SplitMongoBackend in the splitmongo.py module above to override where the active versions information come from (pull them from SplitModuleStoreCourseIndex)

Important notes:

  • Be careful about data types. I don’t remember where we might be assuming that we’re passing back BSON ObjectIds vs. strings.
  • The structures.py script splits plan generation from pruning, so it should be straightforward to test that plan generation with the new source is working without doing anything potentially destructive.
  • This script tries to be very careful about memory usage.The active_versions collection is very small, relatively speaking, so this shouldn’t bite you. But please be very careful about resource usage in general. The first pruning that we went through at edX took 2 hours for the plan generation and 8.5 hours for pruning. The incremental prunings since then have been much smaller. I think you might only need to run Django code for the plan generation, and if you can run without it for the pruning, that’s probably better.
  • There are a lot of comments in structures.py and splitmongo.py. Please read through it. I tried very hard to make that thing as safe as possible to run when I wrote it, but anything that deletes content should be treated with great suspicion.
  • There are eduNEXT folks like @mgmdi and @Felipe who have had to deal with issues around pruning and might have more insight.

Good luck!

2 Likes