the previous (outdated) image is shown on the /courses page, but it’s up to date on all the pages that use the data from the DB, not from the index (like on the course about page)
The task looks ok, but the info it receives from the modulestore is outdated (e.g. it receives the old image path with the course_image_url(course) call) for some reason. The triggered at time in the logs is correct, the queue looks correct too.
reloading of the workers fixes the issue for some time (at least one correct update)
replacing <task>.delay with task.apply_async to apply the countdown (I’ve chosen a 5-sec countdown randomly) fixes the issue permanently:
the issue isn’t reproducible if to call the task as a regular function (without .delay or .apply_async)
Using time.sleep and trying to get the course object with the delay (I’ve tried up to 15sec) has no effect - the course image is always outdated
the issue isn’t reproduced if using debug (for the CourseDetails.update_from_json method) and walking the code step-by-step. The course_published signal is fired twice at some moment (didn’t find the reason yet)
I didn’t find any caching issues according to getting the course data from mongo for the index update
My 2 cent, is that the task to update to ES index is triggerd before the content is updated _i.e. there are two async tasks of which one depends on the other, so it might be that those tasks are running in parallel, instead in a chain or pipeline _ I am just speculating here
It might be a race condition as @regis indicated, where workers aren’t seeing the data that’s modified by the web view. This is especially possible if it’s reading from CourseOverviews.
Another race-condition-adjacent possibility that might be worth looking into is whether or not there is clock drift between the server running the web frontend, the server running the celery workers, and possibly MongoDB (I don’t remember exactly how the modified timestamp is created). The current timestamp is getting passed to the indexing function, which implies that it’s using the timestamp to narrow down the range of content changes it has to index. If the clocks are different, the workers may be looking at the wrong time window and not see the change.
Another potential race condition can happen if MongoDB is configured to read from secondaries–that’s usually the best thing to do for spreading load, but there could be propagation delay that prevents the workers from seeing it. It’s possible you’re not seeing that effect when everything is run in process because Modulestore’s internal caching is preventing it from re-fetching the data.