Celery tasks are broken

Yes, the system is now broken again.

Redis logs:

tutor@production:~$ tutor local logs --tail=100 redis
docker-compose -f /home/tutor/.local/share/tutor/env/local/docker-compose.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local logs --tail 100 redis
Attaching to tutor_local_redis_1
redis_1                         | 1540:C 17 Oct 2022 08:00:38.191 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:00:38.225 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:05:39.032 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:05:39.043 * Background saving started by pid 1541
redis_1                         | 1541:C 17 Oct 2022 08:05:45.200 * DB saved on disk
redis_1                         | 1541:C 17 Oct 2022 08:05:45.214 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:05:45.279 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:10:46.097 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:10:46.107 * Background saving started by pid 1542
redis_1                         | 1542:C 17 Oct 2022 08:10:51.402 * DB saved on disk
redis_1                         | 1542:C 17 Oct 2022 08:10:51.415 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:10:51.439 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:15:52.082 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:15:52.095 * Background saving started by pid 1543
redis_1                         | 1543:C 17 Oct 2022 08:15:58.097 * DB saved on disk
redis_1                         | 1543:C 17 Oct 2022 08:15:58.112 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:15:58.138 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:20:59.052 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:20:59.065 * Background saving started by pid 1544
redis_1                         | 1544:C 17 Oct 2022 08:21:04.817 * DB saved on disk
redis_1                         | 1544:C 17 Oct 2022 08:21:04.830 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:21:04.915 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:26:05.011 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:26:05.026 * Background saving started by pid 1545
redis_1                         | 1545:C 17 Oct 2022 08:26:11.200 * DB saved on disk
redis_1                         | 1545:C 17 Oct 2022 08:26:11.218 * RDB: 3 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:26:11.282 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:31:12.075 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:31:12.085 * Background saving started by pid 1546
redis_1                         | 1546:C 17 Oct 2022 08:31:17.493 * DB saved on disk
redis_1                         | 1546:C 17 Oct 2022 08:31:17.506 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:31:17.522 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:36:18.076 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:36:18.088 * Background saving started by pid 1547
redis_1                         | 1547:C 17 Oct 2022 08:36:23.456 * DB saved on disk
redis_1                         | 1547:C 17 Oct 2022 08:36:23.475 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:36:23.519 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:41:24.054 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:41:24.064 * Background saving started by pid 1548
redis_1                         | 1548:C 17 Oct 2022 08:41:29.303 * DB saved on disk
redis_1                         | 1548:C 17 Oct 2022 08:41:29.315 * RDB: 3 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:41:29.397 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:46:30.064 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:46:30.074 * Background saving started by pid 1549
redis_1                         | 1549:C 17 Oct 2022 08:46:35.634 * DB saved on disk
redis_1                         | 1549:C 17 Oct 2022 08:46:35.647 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:46:35.719 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:51:36.098 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:51:36.111 * Background saving started by pid 1550
redis_1                         | 1550:C 17 Oct 2022 08:51:41.199 * DB saved on disk
redis_1                         | 1550:C 17 Oct 2022 08:51:41.211 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:51:41.245 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:56:42.024 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:56:42.042 * Background saving started by pid 1551
redis_1                         | 1551:C 17 Oct 2022 08:56:47.326 * DB saved on disk
redis_1                         | 1551:C 17 Oct 2022 08:56:47.340 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:56:47.375 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 09:01:48.050 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:01:48.064 * Background saving started by pid 1552
redis_1                         | 1552:C 17 Oct 2022 09:01:53.227 * DB saved on disk
redis_1                         | 1552:C 17 Oct 2022 09:01:53.239 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:01:53.298 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 09:06:54.063 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:06:54.075 * Background saving started by pid 1553
redis_1                         | 1553:C 17 Oct 2022 09:07:00.243 * DB saved on disk
redis_1                         | 1553:C 17 Oct 2022 09:07:00.264 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:07:00.322 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 09:12:01.087 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:12:01.099 * Background saving started by pid 1554
redis_1                         | 1554:C 17 Oct 2022 09:12:07.178 * DB saved on disk
redis_1                         | 1554:C 17 Oct 2022 09:12:07.194 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:12:07.241 * Background saving terminated with success
redis_1                         | 1:signal-handler (1665998184) Received SIGTERM scheduling shutdown...
redis_1                         | 1:M 17 Oct 2022 09:16:24.918 # User requested shutdown...
redis_1                         | 1:M 17 Oct 2022 09:16:24.918 * Calling fsync() on the AOF file.
redis_1                         | 1:M 17 Oct 2022 09:16:24.927 * Saving the final RDB snapshot before exiting.
redis_1                         | 1:M 17 Oct 2022 09:16:30.529 * DB saved on disk
redis_1                         | 1:M 17 Oct 2022 09:16:30.529 * Removing the pid file.
redis_1                         | 1:M 17 Oct 2022 09:16:30.529 # Redis is now ready to exit, bye bye...
redis_1                         | 1:C 17 Oct 2022 09:16:31.557 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                         | 1:C 17 Oct 2022 09:16:31.557 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                         | 1:C 17 Oct 2022 09:16:31.557 # Configuration loaded
redis_1                         | 1:M 17 Oct 2022 09:16:31.559 * monotonic clock: POSIX clock_gettime
redis_1                         | 1:M 17 Oct 2022 09:16:31.562 * Running mode=standalone, port=6379.
redis_1                         | 1:M 17 Oct 2022 09:16:31.562 # Server initialized
redis_1                         | 1:M 17 Oct 2022 09:16:31.562 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * Reading RDB preamble from AOF file...
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * Loading RDB produced by version 6.2.1
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * RDB age 6305295 seconds
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * RDB memory usage when created 280.22 Mb
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * RDB has an AOF tail
redis_1                         | 1:M 17 Oct 2022 09:16:35.119 # Done loading RDB, keys loaded: 44250, keys expired: 0.
redis_1                         | 1:M 17 Oct 2022 09:16:35.119 * Reading the remaining AOF tail...
redis_1                         | 1:M 17 Oct 2022 09:17:26.066 * DB loaded from append only file: 54.504 seconds
redis_1                         | 1:M 17 Oct 2022 09:17:26.066 * Ready to accept connections
redis_1                         | 1:M 17 Oct 2022 09:17:32.011 * 10000 changes in 60 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:17:32.024 * Background saving started by pid 13
redis_1                         | 13:C 17 Oct 2022 09:17:37.715 * DB saved on disk
redis_1                         | 13:C 17 Oct 2022 09:17:37.731 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:17:37.765 * Background saving terminated with success

CMS worker logs:

tutor@production:~$ tutor local logs --tail=100 cms-worker
docker-compose -f /home/tutor/.local/share/tutor/env/local/docker-compose.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local logs --tail 100 cms-worker
Attaching to tutor_local_cms-worker_1
cms-worker_1                    | - ** ---------- .> results:     
cms-worker_1                    | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                    | -- ******* ---- .> task events: ON
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    |  -------------- [queues]
cms-worker_1                    |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                    |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                    |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                    | 
cms-worker_1                    | [tasks]
cms-worker_1                    |   . celery_utils.tasks.mark_resolved
cms-worker_1                    |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                    |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                    |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                    |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                    |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                    |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                    |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                    |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                    |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                    |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                    |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                    |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                    |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                    |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                    |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                    |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                    |   . super_csv.mixins.do_deferred_commit
cms-worker_1                    |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:16:59,345: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 2.00 seconds... (1/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:01,351: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 4.00 seconds... (2/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:05,374: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 6.00 seconds... (3/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:11,386: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 8.00 seconds... (4/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:19,409: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 10.00 seconds... (5/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:29,433: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                    | [2022-10-17 09:17:34,444: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                    | [2022-10-17 09:17:35,485: INFO/MainProcess] mingle: sync with 2 nodes
cms-worker_1                    | [2022-10-17 09:17:35,490: INFO/MainProcess] mingle: sync complete
cms-worker_1                    | [2022-10-17 09:17:35,550: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.

Redis logs after docker restart <cms worker id>:

redis_1                         | 1:M 17 Oct 2022 09:22:38.069 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:22:38.080 * Background saving started by pid 14
redis_1                         | 14:C 17 Oct 2022 09:22:43.624 * DB saved on disk
redis_1                         | 14:C 17 Oct 2022 09:22:43.637 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:22:43.723 * Background saving terminated with success

CMS logs after cms worker restart:

cms-worker_1                    |  -------------- celery@edx.cms.core.default.%7a28f7644e28 v5.2.6 (dawn-chorus)
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    | -- ******* ---- Linux-5.15.54-x86_64-with-glibc2.29 2022-10-17 09:22:57
cms-worker_1                    | - *** --- * --- 
cms-worker_1                    | - ** ---------- [config]
cms-worker_1                    | - ** ---------- .> app:         proj:0x7fa59f9f4ac0
cms-worker_1                    | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                    | - ** ---------- .> results:     
cms-worker_1                    | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                    | -- ******* ---- .> task events: ON
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    |  -------------- [queues]
cms-worker_1                    |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                    |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                    |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                    | 
cms-worker_1                    | [tasks]
cms-worker_1                    |   . celery_utils.tasks.mark_resolved
cms-worker_1                    |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                    |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                    |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                    |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                    |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                    |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                    |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                    |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                    |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                    |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                    |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                    |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                    |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                    |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                    |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                    |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                    |   . super_csv.mixins.do_deferred_commit
cms-worker_1                    |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:23:05,557: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                    | [2022-10-17 09:23:05,567: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                    | [2022-10-17 09:23:06,593: INFO/MainProcess] mingle: sync with 2 nodes
cms-worker_1                    | [2022-10-17 09:23:06,596: INFO/MainProcess] mingle: sync complete
cms-worker_1                    | [2022-10-17 09:23:06,641: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.
cms-worker_1                    | [2022-10-17 09:23:06,646: INFO/MainProcess] Task lms.djangoapps.grades.tasks.compute_all_grades_for_course[5d72f98b-99b3-402f-bca4-e065524526f2] received
cms-worker_1                    | [2022-10-17 09:23:06,652: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.export_olx[2b8bda19-5c2a-43e6-ac39-0c7ea0e6dbe3] received
cms-worker_1                    | [2022-10-17 09:23:06,655: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79] received
cms-worker_1                    | [2022-10-17 09:23:06,658: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task[c21ec78d-151e-4c4c-9682-824e5507338e] received
cms-worker_1                    | [2022-10-17 09:23:06,666: INFO/ForkPoolWorker-4] cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79]: Attempting to register exams for course course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,667: INFO/ForkPoolWorker-4] cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79]: Successfully registered exams for course course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,667: INFO/ForkPoolWorker-4] cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79]: Publishing course course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,668: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.update_search_index[1a2b9790-aa58-4765-b4fa-bd09ac1f4077] received
cms-worker_1                    | [2022-10-17 09:23:06,677: INFO/MainProcess] Task openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task[b4561790-e99d-44f1-a65d-2ab6fc358a06] received
cms-worker_1                    | [2022-10-17 09:23:06,685: INFO/MainProcess] Task openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache[8f89fc2a-0ab1-406f-9692-7205252d7861] received
cms-worker_1                    | [2022-10-17 09:23:06,689: INFO/MainProcess] Task openedx.core.djangoapps.course_apps.tasks.update_course_apps_status[8875d9b9-9ff6-4f11-b547-c13bdf19e454] received
cms-worker_1                    | [2022-10-17 09:23:06,696: INFO/MainProcess] Task lms.djangoapps.discussion.tasks.update_discussions_map[1a63e555-7608-47d3-bff3-aa1bfff3c8a9] received
cms-worker_1                    | [2022-10-17 09:23:06,696: INFO/ForkPoolWorker-5] Updating discussion settings for course: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,747: INFO/ForkPoolWorker-4] Task cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79] succeeded in 0.09090760699473321s: None
cms-worker_1                    | [2022-10-17 09:23:06,754: WARNING/ForkPoolWorker-3] Grades: Flag 'grades.enforce_freeze_grade_after_course_end' accessed without a request, which is likely in the context of a celery task.
cms-worker_1                    | [2022-10-17 09:23:06,757: INFO/ForkPoolWorker-4] Starting XBlockCaches update for course_key: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,767: INFO/ForkPoolWorker-3] Attempted compute_all_grades_for_course for course 'course-v1:Totem+BOT101+course', but grades are frozen.
cms-worker_1                    | [2022-10-17 09:23:06,770: INFO/ForkPoolWorker-3] Task lms.djangoapps.grades.tasks.compute_all_grades_for_course[5d72f98b-99b3-402f-bca4-e065524526f2] succeeded in 0.11385197890922427s: None
cms-worker_1                    | [2022-10-17 09:23:06,776: INFO/ForkPoolWorker-3] openedx.core.djangoapps.course_apps.tasks.update_course_apps_status[8875d9b9-9ff6-4f11-b547-c13bdf19e454]: Caching course apps status for course with id: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,800: INFO/ForkPoolWorker-2] HEAD http://elasticsearch:9200/courseware_content [status:200 request:0.010s]
cms-worker_1                    | [2022-10-17 09:23:06,944: INFO/ForkPoolWorker-2] HEAD http://elasticsearch:9200/course_info [status:200 request:0.005s]
cms-worker_1                    | [2022-10-17 09:23:06,985: INFO/ForkPoolWorker-2] GET http://elasticsearch:9200/course_info/_mapping [status:200 request:0.003s]
cms-worker_1                    | [2022-10-17 09:23:07,012: INFO/ForkPoolWorker-2] POST http://elasticsearch:9200/_bulk [status:200 request:0.021s]
cms-worker_1                    | [2022-10-17 09:23:08,057: INFO/ForkPoolWorker-2] GET http://elasticsearch:9200/courseware_content/_mapping [status:200 request:0.003s]
cms-worker_1                    | [2022-10-17 09:23:08,116: INFO/ForkPoolWorker-2] POST http://elasticsearch:9200/_bulk [status:200 request:0.035s]
cms-worker_1                    | [2022-10-17 09:23:08,130: INFO/ForkPoolWorker-2] POST http://elasticsearch:9200/courseware_content/_search [status:200 request:0.011s]
cms-worker_1                    | [2022-10-17 09:23:08,132: INFO/ForkPoolWorker-2] Task cms.djangoapps.contentstore.tasks.update_search_index[1a2b9790-aa58-4765-b4fa-bd09ac1f4077] succeeded in 1.4569336449494585s: None
cms-worker_1                    | [2022-10-17 09:23:08,182: INFO/ForkPoolWorker-2] Task lms.djangoapps.discussion.tasks.update_discussions_map[1a63e555-7608-47d3-bff3-aa1bfff3c8a9] succeeded in 0.04591151303611696s: None
cms-worker_1                    | [2022-10-17 09:23:11,919: INFO/ForkPoolWorker-5] Updating existing discussion topic links for course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:11,925: INFO/ForkPoolWorker-5] Creating new discussion topic links for course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:11,931: INFO/ForkPoolWorker-5] Responses of the Open edX Event <org.openedx.learning.discussions.configuration.changed.v1>: 
cms-worker_1                    | [(<function handle_course_discussion_config_update at 0x7fa5728d1f70>, None)]
cms-worker_1                    | [2022-10-17 09:23:11,933: INFO/ForkPoolWorker-5] Task openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task[b4561790-e99d-44f1-a65d-2ab6fc358a06] succeeded in 5.246437182999216s: None
cms-worker_1                    | [2022-10-17 09:23:11,965: INFO/ForkPoolWorker-1] Replacing CourseOutline for course-v1:Totem+TP_SD_FR+01 (version 634d11d481aec31389a88f93, 12 sequences)
cms-worker_1                    | [2022-10-17 09:23:11,978: INFO/ForkPoolWorker-1] Found CourseContext for course-v1:Totem+TP_SD_FR+01, updating...
cms-worker_1                    | [2022-10-17 09:23:12,042: WARNING/ForkPoolWorker-3] Flag 'course_live.enable_course_live' accessed without a request, which is likely in the context of a celery task.
cms-worker_1                    | [2022-10-17 09:23:12,055: WARNING/ForkPoolWorker-3] Flag 'teams.enable_teams_app' accessed without a request, which is likely in the context of a celery task.
cms-worker_1                    | [2022-10-17 09:23:12,086: INFO/ForkPoolWorker-3] Task openedx.core.djangoapps.course_apps.tasks.update_course_apps_status[8875d9b9-9ff6-4f11-b547-c13bdf19e454] succeeded in 5.311822294024751s: None
cms-worker_1                    | [2022-10-17 09:23:12,138: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@vertical+block@073c8cdf616f4dbe9035bc76d3415a78
cms-worker_1                    | [2022-10-17 09:23:12,143: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@html+block@9b3b90445210463fa900218b9d408be5
cms-worker_1                    | [2022-10-17 09:23:12,146: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@vertical+block@ba7bb75754b14485a5153fdee4808d1c
cms-worker_1                    | [2022-10-17 09:23:12,148: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@html+block@68e2796c9a4243b88a531525e7a9ed27
cms-worker_1                    | [2022-10-17 09:23:12,179: INFO/ForkPoolWorker-4] Ending XBlockCaches update for course_key: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:12,182: INFO/ForkPoolWorker-4] Task openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache[8f89fc2a-0ab1-406f-9692-7205252d7861] succeeded in 5.425541295087896s: None
cms-worker_1                    | [2022-10-17 09:23:12,206: INFO/ForkPoolWorker-1] Task cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task[c21ec78d-151e-4c4c-9682-824e5507338e] succeeded in 5.535299669019878s: None
cms-worker_1                    | [2022-10-17 09:23:17,800: INFO/MainProcess] Task cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[aa284a42-f396-4927-aa9d-c71b6e5f4290] received
cms-worker_1                    | [2022-10-17 09:23:17,801: INFO/ForkPoolWorker-6] Task cms.djangoapps.contentstore.tasks.export_olx[2b8bda19-5c2a-43e6-ac39-0c7ea0e6dbe3] succeeded in 11.094052562024444s: None
cms-worker_1                    | [2022-10-17 09:23:17,963: INFO/ForkPoolWorker-3] cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[aa284a42-f396-4927-aa9d-c71b6e5f4290]: Task complete email has been sent to User x@y.com
cms-worker_1                    | [2022-10-17 09:23:17,966: INFO/ForkPoolWorker-3] Task cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[aa284a42-f396-4927-aa9d-c71b6e5f4290] succeeded in 0.16261423006653786s: None

In a broken state, what are the logs coming from the cairn-superset-worker container?

tutor local logs --tail=100 cairn-superset-worker

Might be on to something here:

Logs before “breaking”:

cairn-superset-worker_1         | [2022-10-17 10:16:00,065: INFO/MainProcess] Received task: reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e]  
cairn-superset-worker_1         | [2022-10-17 10:16:00,108: INFO/ForkPoolWorker-3] Task reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e] succeeded in 0.040656305965967476s: None

Logs after “breaking”:

cairn-superset-worker_1         | [2022-10-17 10:18:01,290: INFO/MainProcess] missed heartbeat from celery@edx.lms.core.default.%439d57b0205f
cairn-superset-worker_1         | [2022-10-17 10:18:01,292: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
cairn-superset-worker_1         | Traceback (most recent call last):
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 706, in send_packed_command
cairn-superset-worker_1         |     sendall(self._sock, item)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/_compat.py", line 9, in sendall
cairn-superset-worker_1         |     return sock.sendall(*args, **kwargs)
cairn-superset-worker_1         | BrokenPipeError: [Errno 32] Broken pipe
cairn-superset-worker_1         | 
cairn-superset-worker_1         | During handling of the above exception, another exception occurred:
cairn-superset-worker_1         | 
cairn-superset-worker_1         | Traceback (most recent call last):
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 318, in start
cairn-superset-worker_1         |     blueprint.start(self)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/bootsteps.py", line 119, in start
cairn-superset-worker_1         |     step.start(parent)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 599, in start
cairn-superset-worker_1         |     c.loop(*c.loop_args())
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/worker/loops.py", line 83, in asynloop
cairn-superset-worker_1         |     next(loop)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/asynchronous/hub.py", line 364, in create_loop
cairn-superset-worker_1         |     cb(*cbargs)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 1083, in on_readable
cairn-superset-worker_1         |     self.cycle.on_readable(fileno)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 354, in on_readable
cairn-superset-worker_1         |     chan.handlers[type]()
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 688, in _receive
cairn-superset-worker_1         |     ret.append(self._receive_one(c))
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 698, in _receive_one
cairn-superset-worker_1         |     response = c.parse_response()
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3501, in parse_response
cairn-superset-worker_1         |     self.check_health()
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3522, in check_health
cairn-superset-worker_1         |     check_health=False)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 726, in send_command
cairn-superset-worker_1         |     check_health=kwargs.get('check_health', True))
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 718, in send_packed_command
cairn-superset-worker_1         |     (errno, errmsg))
cairn-superset-worker_1         | redis.exceptions.ConnectionError: Error 32 while writing to socket. Broken pipe.
cairn-superset-worker_1         | [2022-10-17 10:18:01,317: INFO/MainProcess] Connected to redis://redis:6379/0
cairn-superset-worker_1         | [2022-10-17 10:18:01,329: INFO/MainProcess] mingle: searching for neighbors
cairn-superset-worker_1         | [2022-10-17 10:18:02,338: INFO/MainProcess] mingle: all alone
cairn-superset-worker_1         | [2022-10-17 10:18:02,373: INFO/MainProcess] Received task: reports.scheduler[a3f8ca63-297a-48ab-96f2-196521334496]  
cairn-superset-worker_1         | [2022-10-17 10:18:02,412: INFO/ForkPoolWorker-3] Task reports.scheduler[a3f8ca63-297a-48ab-96f2-196521334496] succeeded in 0.032858264981769025s: None
cairn-superset-worker_1         | [2022-10-17 10:18:12,073: INFO/MainProcess] sync with celery@edx.cms.core.default.%7a28f7644e28
cairn-superset-worker_1         | [2022-10-17 10:19:00,037: INFO/MainProcess] Received task: reports.scheduler[b03a01c6-fe4f-42a2-8301-49396b539c7b]  
cairn-superset-worker_1         | [2022-10-17 10:19:00,063: INFO/ForkPoolWorker-3] Task reports.scheduler[b03a01c6-fe4f-42a2-8301-49396b539c7b] succeeded in 0.023648836999200284s: None
cairn-superset-worker_1         | [2022-10-17 10:16:00,065: INFO/MainProcess] Received task: reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e]  
cairn-superset-worker_1         | [2022-10-17 10:16:00,108: INFO/ForkPoolWorker-3] Task reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e] succeeded in 0.040656305965967476s: None

So it just had the one error, but seems to pick up OK after that… Interestingly it skipped 10:17, which is around the time I ran tutor local restart redis cms-worker

Yes, the cairn worker is failing because it is (incorrectly) configured to connect to the default Redis instance, instead of the cairn-redis container. Thus, the cairn worker is failing when we restart the redis container. The mingle: sync with 2 nodes log line tipped me off.

The superset and the cms workers should really not be “mingling”. Let’s stop the cairn worker and see whether this resolves the issue:

tutor local stop cairn-superset-worker
tutor local restart redis cms-worker

After these commands, are you in a failed state?

I ran those commands, the CMS entered the failing state again. Maybe because the other superset containers are still running? like the cairn redis?

I also do still see a cairn worker running, after running that command:

docker ps | grep cairn
f94d7124a2c9   overhangio/cairn-superset:14.0.1     "/bin/sh -c 'gunicor…"   5 days ago   Up 5 days (healthy)   8088/tcp                                                                             tutor_local_cairn-superset_1
1afd8d16dfed   overhangio/cairn-superset:14.0.1     "celery beat --app=s…"   5 days ago   Up 5 days             8088/tcp                                                                             tutor_local_cairn-superset-worker-beat_1
663cb123fa47   postgres:9.6-alpine                  "docker-entrypoint.s…"   5 days ago   Up 5 days             5432/tcp                                                                             tutor_local_cairn-postgresql_1
67655edc04d7   overhangio/cairn-clickhouse:14.0.1   "/entrypoint.sh"         5 days ago   Up 5 days             8123/tcp, 9000/tcp, 9009/tcp                                                         tutor_local_cairn-clickhouse_1
c3a7fa385ec5   timberio/vector:0.18.1-alpine        "/usr/local/bin/vect…"   5 days ago   Up 5 days                                                                                                  tutor_local_cairn-vector_1
fbab7fa123e5   redis:6.2.4-alpine                   "docker-entrypoint.s…"   5 days ago   Up 5 days             6379/tcp                                                                             tutor_local_cairn-redis_1

Right. Let’s stop it as well and attempt to restart the cms-worker:

tutor local stop cairn-superset-worker cairn-superset-worker-beat
tutor local restart redis cms-worker

At this point, are we still in a failed state?

OK, so I did not run the docker restart <cms worker id> first, I just ran those commands. It’s still broken.

By the way, thank you a lot for your help so far!

Let’s go nuclear and disable the cairn plugin. We’ll see whether it’s breaking the cms-worker. This will not delete your Cairn data.

tutor plugins disable cairn
tutor local quickstart -I
tutor local logs --tail=100 -f cms-worker

After that, let’s wait for a couple seconds, until the redis has been loaded to memory and the cms-worker is ready to pick up tasks. Then, trigger a course export. Is it working?

OK, gotta check with my colleagues if anyone is using Cairn at the moment, though, so I can’t do it right away. I’ll also trigger an image build to reduce downtime :wink:

Should there be a tutor config save in between, or does quickstart -I include that?

Yes, quickstart does include a save.

OK, disabled cairn, did the quickstart, everything is running, but course export is still not working :flushed:

Didn’t see that one coming…

CMS worker logs:

cms-worker_1                 | worker: Warm shutdown (MainProcess)
cms-worker_1                 |  
cms-worker_1                 |  -------------- celery@edx.cms.core.default.%7a28f7644e28 v5.2.6 (dawn-chorus)
cms-worker_1                 | --- ***** ----- 
cms-worker_1                 | -- ******* ---- Linux-5.15.54-x86_64-with-glibc2.29 2022-10-17 11:52:11
cms-worker_1                 | - *** --- * --- 
cms-worker_1                 | - ** ---------- [config]
cms-worker_1                 | - ** ---------- .> app:         proj:0x7f9633b56b20
cms-worker_1                 | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                 | - ** ---------- .> results:     
cms-worker_1                 | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                 | -- ******* ---- .> task events: ON
cms-worker_1                 | --- ***** ----- 
cms-worker_1                 |  -------------- [queues]
cms-worker_1                 |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                 |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                 |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                 | 
cms-worker_1                 | [tasks]
cms-worker_1                 |   . celery_utils.tasks.mark_resolved
cms-worker_1                 |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                 |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                 |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                 |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                 |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                 |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                 |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                 |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                 |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                 |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                 |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                 |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                 |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                 |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                 |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                 |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                 |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                 |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                 |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                 |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                 |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                 |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                 |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                 |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                 |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                 |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                 |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                 |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                 |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                 |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                 |   . super_csv.mixins.do_deferred_commit
cms-worker_1                 |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:16,518: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 2.00 seconds... (1/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:18,550: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 4.00 seconds... (2/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:22,561: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 6.00 seconds... (3/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:28,615: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 8.00 seconds... (4/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:36,722: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 10.00 seconds... (5/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:46,739: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | Trying again in 12.00 seconds... (6/100)
cms-worker_1                 | 
cms-worker_1                 | [2022-10-17 11:52:58,833: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                 | [2022-10-17 11:52:58,843: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                 | [2022-10-17 11:52:59,858: INFO/MainProcess] mingle: all alone
cms-worker_1                 | [2022-10-17 11:52:59,892: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.

OK so Cairn is not the culprit here. You may re-enable it if you want.

Let’s look at Redis. It’s unusual that it takes so long to load data in memory. How much data do you have? You can check by running:

ls -lh $(tutor config printroot)/data/redis

Quite some data:

tutor@production:~$ ls -lh $(tutor config printroot)/data/redis
total 3,9G
-rw-r--r-- 1 tutor systemd-coredump 3,6G okt 17 13:01 appendonly.aof
-rw-r--r-- 1 tutor tutor            296M okt 17 12:59 dump.rdb

Let’s try to remove all data from Redis:

tutor local stop redis
sudo mv $(tutor config printroot)/data/redis $(tutor config printroot)/data/redis.backup
tutor local restart redis cms-worker

After you’ve run these commands, is the course export feature working?

EDIT: I’ve attempted to reproduce the issue with a large Redis dataset, without success. Here’s what I did to fill the Redis cache:

tutor local run cms ./manage.py cms shell -c 'import string, random; from django.core.cache import cache
for _ in range(10**7):
    cache.set("dummy_" + "".join(random.choices(string.ascii_letters, k=256)), "".join(random.choices(string.ascii_letters, k=256)), timeout=3600); print(f"{_/10**5}%")'

I’ve also moved the Redis volume to a slower disk than my SSD.

Then, restart the redis and cms-worker containers:

tutor local restart redis cms-worker

I can see that the cms-worker is having trouble connecting to Redis:

tutor_local-cms-worker-1  | [2022-10-17 13:39:50,213: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
tutor_local-cms-worker-1  | Trying again in 2.00 seconds... (1/100)
tutor_local-cms-worker-1  | 
tutor_local-cms-worker-1  | [2022-10-17 13:39:52,221: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
tutor_local-cms-worker-1  | Trying again in 4.00 seconds... (2/100)

Eventually connection is successful:

tutor_local-cms-worker-1  | [2022-10-17 13:39:56,229: INFO/MainProcess] Connected to redis://redis:6379/0
tutor_local-cms-worker-1  | [2022-10-17 13:39:56,232: INFO/MainProcess] mingle: searching for neighbors
tutor_local-cms-worker-1  | [2022-10-17 13:39:57,241: INFO/MainProcess] mingle: all alone

I am then able to successfuly trigger a course export from the Studio. Alternatively, we can trigger an artificial asynchronous task by running:

tutor local run cms ./manage.py cms shell -c 'from cms.djangoapps.contentstore.tasks import update_search_index; update_search_index.delay("course-v1:edX+DemoX+Demo_Course", "2022-10-17T15:00:00.00")'

We can see that the task is correctly picked up by the cms-worker.

At this point I really don’t know what to do to reproduce the error…

Well, even though you can’t reproduce it, removing the Redis data seems to have solved the issue. I did another quickstart to make sure there was a new Redis directory with the correct ownership rights, and then everything worked…

Just to be sure I ran tutor local restart redis cms-worker again, and exporting courses still works.

So I’m sorry you can’t reproduce the issue, but I’m eternally grateful for your help solving it!

Am I correct in assuming we don’t need the old redis data, because it’s mostly cache? Or does it contain something important?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.