Celery tasks are broken

Let’s go nuclear and disable the cairn plugin. We’ll see whether it’s breaking the cms-worker. This will not delete your Cairn data.

tutor plugins disable cairn
tutor local quickstart -I
tutor local logs --tail=100 -f cms-worker

After that, let’s wait for a couple seconds, until the redis has been loaded to memory and the cms-worker is ready to pick up tasks. Then, trigger a course export. Is it working?

OK, gotta check with my colleagues if anyone is using Cairn at the moment, though, so I can’t do it right away. I’ll also trigger an image build to reduce downtime :wink:

Should there be a tutor config save in between, or does quickstart -I include that?

Yes, quickstart does include a save.

OK, disabled cairn, did the quickstart, everything is running, but course export is still not working :flushed:

Didn’t see that one coming…

CMS worker logs:

cms-worker_1                 | worker: Warm shutdown (MainProcess)
cms-worker_1                 |  
cms-worker_1                 |  -------------- celery@edx.cms.core.default.%7a28f7644e28 v5.2.6 (dawn-chorus)
cms-worker_1                 | --- ***** ----- 
cms-worker_1                 | -- ******* ---- Linux-5.15.54-x86_64-with-glibc2.29 2022-10-17 11:52:11
cms-worker_1                 | - *** --- * --- 
cms-worker_1                 | - ** ---------- [config]
cms-worker_1                 | - ** ---------- .> app:         proj:0x7f9633b56b20
cms-worker_1                 | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                 | - ** ---------- .> results:     
cms-worker_1                 | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                 | -- ******* ---- .> task events: ON
cms-worker_1                 | --- ***** ----- 
cms-worker_1                 |  -------------- [queues]
cms-worker_1                 |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                 |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                 |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                 | 
cms-worker_1                 | [tasks]
cms-worker_1                 |   . celery_utils.tasks.mark_resolved
cms-worker_1                 |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                 |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                 |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                 |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                 |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                 |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                 |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                 |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                 |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                 |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                 |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                 |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                 |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                 |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                 |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                 |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                 |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                 |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                 |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                 |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                 |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                 |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                 |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                 |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                 |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                 |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                 |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                 |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                 |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                 |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                 |   . super_csv.mixins.do_deferred_commit
cms-worker_1                 |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:16,518: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 2.00 seconds... (1/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:18,550: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 4.00 seconds... (2/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:22,561: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 6.00 seconds... (3/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:28,615: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 8.00 seconds... (4/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:36,722: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 10.00 seconds... (5/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:46,739: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 12.00 seconds... (6/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:58,833: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                 | [2022-10-17 11:52:58,843: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                 | [2022-10-17 11:52:59,858: INFO/MainProcess] mingle: all alone
cms-worker_1                 | [2022-10-17 11:52:59,892: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.

OK so Cairn is not the culprit here. You may re-enable it if you want.

Let’s look at Redis. It’s unusual that it takes so long to load data in memory. How much data do you have? You can check by running:

ls -lh $(tutor config printroot)/data/redis

Quite some data:

tutor@production:~$ ls -lh $(tutor config printroot)/data/redis
total 3,9G
-rw-r--r-- 1 tutor systemd-coredump 3,6G okt 17 13:01 appendonly.aof
-rw-r--r-- 1 tutor tutor            296M okt 17 12:59 dump.rdb

Let’s try to remove all data from Redis:

tutor local stop redis
sudo mv $(tutor config printroot)/data/redis $(tutor config printroot)/data/redis.backup
tutor local restart redis cms-worker

After you’ve run these commands, is the course export feature working?

EDIT: I’ve attempted to reproduce the issue with a large Redis dataset, without success. Here’s what I did to fill the Redis cache:

tutor local run cms ./manage.py cms shell -c 'import string, random; from django.core.cache import cache
for _ in range(10**7):
    cache.set("dummy_" + "".join(random.choices(string.ascii_letters, k=256)), "".join(random.choices(string.ascii_letters, k=256)), timeout=3600); print(f"{_/10**5}%")'

I’ve also moved the Redis volume to a slower disk than my SSD.

Then, restart the redis and cms-worker containers:

tutor local restart redis cms-worker

I can see that the cms-worker is having trouble connecting to Redis:

tutor_local-cms-worker-1  | [2022-10-17 13:39:50,213: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
tutor_local-cms-worker-1  | Trying again in 2.00 seconds... (1/100)
tutor_local-cms-worker-1  | 
tutor_local-cms-worker-1  | [2022-10-17 13:39:52,221: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
tutor_local-cms-worker-1  | Trying again in 4.00 seconds... (2/100)

Eventually connection is successful:

tutor_local-cms-worker-1  | [2022-10-17 13:39:56,229: INFO/MainProcess] Connected to redis://redis:6379/0
tutor_local-cms-worker-1  | [2022-10-17 13:39:56,232: INFO/MainProcess] mingle: searching for neighbors
tutor_local-cms-worker-1  | [2022-10-17 13:39:57,241: INFO/MainProcess] mingle: all alone

I am then able to successfuly trigger a course export from the Studio. Alternatively, we can trigger an artificial asynchronous task by running:

tutor local run cms ./manage.py cms shell -c 'from cms.djangoapps.contentstore.tasks import update_search_index; update_search_index.delay("course-v1:edX+DemoX+Demo_Course", "2022-10-17T15:00:00.00")'

We can see that the task is correctly picked up by the cms-worker.

At this point I really don’t know what to do to reproduce the error…

Well, even though you can’t reproduce it, removing the Redis data seems to have solved the issue. I did another quickstart to make sure there was a new Redis directory with the correct ownership rights, and then everything worked…

Just to be sure I ran tutor local restart redis cms-worker again, and exporting courses still works.

So I’m sorry you can’t reproduce the issue, but I’m eternally grateful for your help solving it!

Am I correct in assuming we don’t need the old redis data, because it’s mostly cache? Or does it contain something important?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.