When making changes on the Course Schedule and Details page in the CMS and saving them - the automatic reindexing task is fired, but the outdated info is stored in the index.
Prerequisites
course indexing is enabled (FEATURES['ENABLE_COURSEWARE_INDEX'] = True)
course search is enabled (FEATURES['ENABLE_COURSE_DISCOVERY'] = True)
celery is configured
Example (i.e. Steps to Reproduce)
Go to the Studio
Navigate to any course and go to the Schedule and Details section
go to <LMS>/courses/<course_id>/about and confirm your image was updated (search index is not used here)
repeat steps 3 and 4, because in some cases (e.g. workers just restarted) the issue is not reproducible on the first try. Here is my newly uploaded course image:
the previous (outdated) image is shown on the /courses page, but it’s up to date on all the pages that use the data from the DB, not from the index (like on the course about page)
Releases:
Nutmeg
Olive
Investigation Notes
The task looks ok, but the info it receives from the modulestore is outdated (e.g. it receives the old image path with the course_image_url(course) call) for some reason. The triggered at time in the logs is correct, the queue looks correct too.
reloading of the workers fixes the issue for some time (at least one correct update)
replacing <task>.delay with task.apply_async to apply the countdown (I’ve chosen a 5-sec countdown randomly) fixes the issue permanently:
the issue isn’t reproducible if to call the task as a regular function (without .delay or .apply_async)
Using time.sleep and trying to get the course object with the delay (I’ve tried up to 15sec) has no effect - the course image is always outdated
the issue isn’t reproduced if using debug (for the CourseDetails.update_from_json method) and walking the code step-by-step. The course_published signal is fired twice at some moment (didn’t find the reason yet)
I didn’t find any caching issues according to getting the course data from mongo for the index update
I think the index is probably stored in Redis or/and ES, to ensure it’s from Redis side you can try to reproduce the probelm and then flush everything in Redis.
I think it’s edx-search repo that is responsible for the content /courses page The default case in tutor.
I think it’s edx-search repo that is responsible for the content /courses page
This is also true ^ (if FEATURES['ENABLE_COURSE_DISCOVERY'] = True)
From what I’ve got from the logs/debugging I see that the problem is that the wrong data is sent to the ES for storing. The task responsible for that sending receives outdated course data from mongo.
I’m wondering now how I could confirm or refute @regis clues about the race condition
My 2 cent, is that the task to update to ES index is triggerd before the content is updated _i.e. there are two async tasks of which one depends on the other, so it might be that those tasks are running in parallel, instead in a chain or pipeline _ I am just speculating here
It might be a race condition as @regis indicated, where workers aren’t seeing the data that’s modified by the web view. This is especially possible if it’s reading from CourseOverviews.
Another race-condition-adjacent possibility that might be worth looking into is whether or not there is clock drift between the server running the web frontend, the server running the celery workers, and possibly MongoDB (I don’t remember exactly how the modified timestamp is created). The current timestamp is getting passed to the indexing function, which implies that it’s using the timestamp to narrow down the range of content changes it has to index. If the clocks are different, the workers may be looking at the wrong time window and not see the change.
Another potential race condition can happen if MongoDB is configured to read from secondaries–that’s usually the best thing to do for spreading load, but there could be propagation delay that prevents the workers from seeing it. It’s possible you’re not seeing that effect when everything is run in process because Modulestore’s internal caching is preventing it from re-fetching the data.