Celery tasks are broken

Hi all, I’m hoping somebody can help me. I seem to be running into exactly this unsolved problem. When I try to export a course, it stays stuck on “Exporting…”. I also tried sending an email from a course, which stays stuck on:

So this makes me assume that Celery is somehow broken. I can’t find any errors in the logs, though. Nothing in the lms-worker and cms-worker logs, nor are there any errors in the lms and cms logs when I try to run these tasks…

I’ve been trying to debug celery from inside one of the containers, but there I do get errors. I’m not sure if I’m just doing it wrong, though…

$ tutor local exec cms-worker -- celery status
docker-compose -f /home/tutor/.local/share/tutor/env/local/docker-compose.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local exec cms-worker celery status
Traceback (most recent call last):
  File "/openedx/venv/lib/python3.8/site-packages/amqp/transport.py", line 188, in _connect
    entries = socket.getaddrinfo(
  File "/opt/pyenv/versions/3.8.12/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -9] Address family for hostname not supported

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 446, in _reraise_as_library_errors
    yield
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 433, in _ensure_connection
    return retry_over_time(
  File "/openedx/venv/lib/python3.8/site-packages/kombu/utils/functional.py", line 312, in retry_over_time
    return fun(*args, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 877, in _connection_factory
    self._connection = self._establish_connection()
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 812, in _establish_connection
    conn = self.transport.establish_connection()
  File "/openedx/venv/lib/python3.8/site-packages/kombu/transport/pyamqp.py", line 201, in establish_connection
    conn.connect()
  File "/openedx/venv/lib/python3.8/site-packages/amqp/connection.py", line 323, in connect
    self.transport.connect()
  File "/openedx/venv/lib/python3.8/site-packages/amqp/transport.py", line 129, in connect
    self._connect(self.host, self.port, self.connect_timeout)
  File "/openedx/venv/lib/python3.8/site-packages/amqp/transport.py", line 197, in _connect
    raise (e
  File "/openedx/venv/lib/python3.8/site-packages/amqp/transport.py", line 213, in _connect
    self.sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/openedx/venv/bin/celery", line 8, in <module>
    sys.exit(main())
  File "/openedx/venv/lib/python3.8/site-packages/celery/__main__.py", line 15, in main
    sys.exit(_main())
  File "/openedx/venv/lib/python3.8/site-packages/celery/bin/celery.py", line 217, in main
    return celery(auto_envvar_prefix="CELERY")
  File "/openedx/venv/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/openedx/venv/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/openedx/venv/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/openedx/venv/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/celery/bin/base.py", line 134, in caller
    return f(ctx, *args, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/celery/bin/control.py", line 80, in status
    replies = ctx.obj.app.control.inspect(timeout=timeout,
  File "/openedx/venv/lib/python3.8/site-packages/celery/app/control.py", line 294, in ping
    return self._request('ping')
  File "/openedx/venv/lib/python3.8/site-packages/celery/app/control.py", line 106, in _request
    return self._prepare(self.app.control.broadcast(
  File "/openedx/venv/lib/python3.8/site-packages/celery/app/control.py", line 741, in broadcast
    return self.mailbox(conn)._broadcast(
  File "/openedx/venv/lib/python3.8/site-packages/kombu/pidbox.py", line 328, in _broadcast
    chan = channel or self.connection.default_channel
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 895, in default_channel
    self._ensure_connection(**conn_opts)
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 433, in _ensure_connection
    return retry_over_time(
  File "/opt/pyenv/versions/3.8.12/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/openedx/venv/lib/python3.8/site-packages/kombu/connection.py", line 450, in _reraise_as_library_errors
    raise ConnectionError(str(exc)) from exc
kombu.exceptions.OperationalError: [Errno 111] Connection refused
Error: Command failed with status 1: docker-compose -f /home/tutor/.local/share/tutor/env/local/docker-compose.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local exec cms-worker celery status

Any advice is welcome! Thanks!

I’m running Nutmeg, with

tutor --version
tutor, version 14.0.5

Turns out there’s some kind of race condition that made our Celery start incorrectly(?). I’m still not entirely sure what happened, but restarting the lms-worker and cms-worker containers made them pick up all the stalling tasks.

I’m not sure what to do to prevent this from happening next time, but I’ll give you some more information about our setup so you might be able to reproduce the problem:

  1. We’re running Tutor 14.0.5
  2. We’re running the following plugins:
tutor plugins list
android                              	(disabled)	14.0.0
caddy-security                       	          	0.1.0
cairn                                	          	14.0.1
customcairn                          	          	0.1.0
discovery                            	(disabled)	14.0.1
ecommerce                            	(disabled)	14.0.0
enable-course-staff-grade-download   	          	0.1.0
enable-custom-certificate-templates  	          	0.1.0
enable-discussion-dashboard          	          	0.1.0
extra-translations                   	          	0.1.2
filter-tracking-log                  	          	0.1.0
fix-packages-for-xblock-video        	          	0.1.0
forum                                	          	14.0.0
hide-invisible-courses               	          	0.2.3
indigo                               	          	14.0.0
license                              	          	14.0.0
mfe                                  	          	14.0.0
min-password-length                  	          	0.1.0
minio                                	(disabled)	14.0.0
notes                                	(disabled)	14.0.0
richie                               	(disabled)	14.0.0
show-language-picker                 	          	0.1.1
sort-courses-by-start-date           	(disabled)	0.1.0
studio-redirect                      	          	0.2.0
totem-details                        	          	0.1.0
webui                                	(disabled)	14.0.0
xqueue                               	(disabled)	14.0.0

For the custom plugins, check out our tutor configuration repository, but I don’t expect our plugins to have a big effect on how the workers start.

For now, the workaround is to run docker restart <container_id> for the worker containers.

The reason why we think it’s a race condition is that this problem occurred for the first time after a reboot of the VPS. Then it wasn’t solved by running tutor local reboot -d, but it was solved by only restarting the specific worker containers.

@regis this seems to be a pretty serious problem with our Tutor setup. If this is an actual race condition I can imagine it affects more Tutor users, but maybe only the ones that use Cairn? I’m not sure why or how, but if you can help us investigate the problem that would be greatly appreciated!

The first thing I’m going to need is some information coming from the worker logs.

  1. On worker startup, what is the output of tutor local logs lms-worker? This will indicate which tasks the worker is configured to pick up.
  2. Once a task has been triggered, what are the logs from the lms, redis and the lms-worker? (run: tutor local logs -f lms lms-worker redis) This will help us understand whether the task is correctly queued in Redis.

Thank you for the quick reply!

We restarted tutor again to break it, and then generated these logs:

Regis_Q1.log (11.9 KB)

As for your second question, we encountered the bug while trying to export courses, so we ran:

tutor local logs -f cms cms-worker redis

And the output after triggering an export was:

cms_1                           | 2022-10-11 22:08:56,924 INFO 7 [tracking] [user 0] [ip x.x.x.x] logger.py:41 - {"name": "/export/course-v1:Totem+TP_SM_FR+cours", "context": {"user_id": 0, "path": "/export/course-v1:Totem+TP_SM_FR+cours", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "x", "session": "63c6e60317955153", "ip": "x.x.x.x", "agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.15.5 Chrome/87.0.4280.144 Safari/537.36", "host": "studio.learn.totem-project.org", "referer": "https://studio.learn.totem-project.org/export/course-v1:Totem+TP_SM_FR+cours", "accept_language": "fr-FR,fr;q=0.5", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-10-11T22:08:56.924170+00:00", "event_type": "/export/course-v1:Totem+TP_SM_FR+cours", "event_source": "server", "page": null}
cms_1                           | [pid: 7|app: 0|req: 4/7] 172.18.0.5 () {58 vars in 3877 bytes} [Tue Oct 11 22:08:56 2022] POST /export/course-v1:Totem+TP_SM_FR+cours => generated 23 bytes in 201 msecs (HTTP/1.1 200) 7 headers in 500 bytes (1 switches on core 0)
cms_1                           | 2022-10-11 22:08:58,143 INFO 21 [tracking] [user 0] [ip x.x.x.x] logger.py:41 - {"name": "/export_status/course-v1:Totem+TP_SM_FR+cours", "context": {"user_id": 0, "path": "/export_status/course-v1:Totem+TP_SM_FR+cours", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "x", "session": "63c6e60317955153", "ip": "x.x.x.x", "agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.15.5 Chrome/87.0.4280.144 Safari/537.36", "host": "studio.learn.totem-project.org", "referer": "https://studio.learn.totem-project.org/export/course-v1:Totem+TP_SM_FR+cours", "accept_language": "fr-FR,fr;q=0.5", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-10-11T22:08:58.143650+00:00", "event_type": "/export_status/course-v1:Totem+TP_SM_FR+cours", "event_source": "server", "page": null}
cms_1                           | [pid: 21|app: 0|req: 4/8] 172.18.0.5 () {54 vars in 3665 bytes} [Tue Oct 11 22:08:58 2022] GET /export_status/course-v1:Totem+TP_SM_FR+cours => generated 23 bytes in 45 msecs (HTTP/1.1 200) 7 headers in 500 bytes (1 switches on core 0)
cms_1                           | 2022-10-11 22:08:59,209 INFO 7 [tracking] [user 0] [ip x.x.x.x] logger.py:41 - {"name": "/export_status/course-v1:Totem+TP_SM_FR+cours", "context": {"user_id": 0, "path": "/export_status/course-v1:Totem+TP_SM_FR+cours", "course_id": "", "org_id": "", "enterprise_uuid": ""}, "username": "x", "session": "63c6e60317955153", "ip": "x.x.x.x", "agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.15.5 Chrome/87.0.4280.144 Safari/537.36", "host": "studio.learn.totem-project.org", "referer": "https://studio.learn.totem-project.org/export/course-v1:Totem+TP_SM_FR+cours", "accept_language": "fr-FR,fr;q=0.5", "event": "{\"GET\": {}, \"POST\": {}}", "time": "2022-10-11T22:08:59.208955+00:00", "event_type": "/export_status/course-v1:Totem+TP_SM_FR+cours", "event_source": "server", "page": null}
cms_1                           | [pid: 7|app: 0|req: 5/9] 172.18.0.5 () {54 vars in 3665 bytes} [Tue Oct 11 22:08:59 2022] GET /export_status/course-v1:Totem+TP_SM_FR+cours => generated 23 bytes in 49 msecs (HTTP/1.1 200) 7 headers in 500 bytes (1 switches on core 0)

It’s really strange that the redis and the cms-worker container are not logging anything.

Is the cms-worker logging anything at all concerning asynchronous tasks?

The cms-worker doesn’t log anything about those tasks until we’ve restarted its container.

Also note that when we run this inside the worker:

/openedx/venv/bin/python /openedx/venv/bin/celery --app=lms.celery status

In a broken state, the output is something like:

->  celery@5f3f5226ea94: OK

But in a working state, it’s something like:

->  celery@4c85561096ef: OK
->  celery@edx.cms.core.default.%83c7c3c5f985: OK

In the lms-worker log file you posted I see the following lines:

lms-worker_1                    | 2022-10-11 21:35:12,049 ERROR 1 [celery.worker.consumer.consumer] [user 0] [ip x.x.x.x] consumer.py:451 - consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
lms-worker_1                    | Trying again in 2.00 seconds... (1/100)
lms-worker_1                    |
lms-worker_1                    | 2022-10-11 21:35:14,080 ERROR 1 [celery.worker.consumer.consumer] [user 0] [ip x.x.x.x] consumer.py:451 - consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
lms-worker_1                    | Trying again in 4.00 seconds... (2/100)

Do you see the same thing in the cms-worker logs? For the sake of fixing the issue, let’s ignore the lms-worker for now and focus on cms-worker.

Yes, that does also happen in the CMS worker logs:

cms-worker_1                    |  -------------- celery@edx.cms.core.default.%7a28f7644e28 v5.2.6 (dawn-chorus)
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    | -- ******* ---- Linux-5.15.54-x86_64-with-glibc2.29 2022-10-11 22:05:50
cms-worker_1                    | - *** --- * --- 
cms-worker_1                    | - ** ---------- [config]
cms-worker_1                    | - ** ---------- .> app:         proj:0x7fd9ba18eb20
cms-worker_1                    | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                    | - ** ---------- .> results:     
cms-worker_1                    | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                    | -- ******* ---- .> task events: ON
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    |  -------------- [queues]
cms-worker_1                    |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                    |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                    |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                    | 
cms-worker_1                    | [tasks]
cms-worker_1                    |   . celery_utils.tasks.mark_resolved
cms-worker_1                    |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                    |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                    |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                    |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                    |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                    |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                    |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                    |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                    |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                    |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                    |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                    |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                    |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                    |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                    |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                    |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                    |   . super_csv.mixins.do_deferred_commit
cms-worker_1                    |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:05:55,125: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 2.00 seconds... (1/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:05:57,142: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 4.00 seconds... (2/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:06:01,149: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 6.00 seconds... (3/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:06:07,166: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 8.00 seconds... (4/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:06:15,177: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 10.00 seconds... (5/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:06:25,194: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                    | [2022-10-11 22:06:25,201: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                    | [2022-10-11 22:06:26,217: INFO/MainProcess] mingle: all alone
cms-worker_1                    | [2022-10-11 22:06:26,254: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.
cms-worker_1                    | 

As you can see, it does report Connected to redis://redis:6379/0 in the end.

This is the log after restarting:

cms-worker_1                    | worker: Warm shutdown (MainProcess)
cms-worker_1                    |  
cms-worker_1                    |  -------------- celery@edx.cms.core.default.%7a28f7644e28 v5.2.6 (dawn-chorus)
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    | -- ******* ---- Linux-5.15.54-x86_64-with-glibc2.29 2022-10-11 22:14:12
cms-worker_1                    | - *** --- * --- 
cms-worker_1                    | - ** ---------- [config]
cms-worker_1                    | - ** ---------- .> app:         proj:0x7f1a3ba6aac0
cms-worker_1                    | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                    | - ** ---------- .> results:     
cms-worker_1                    | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                    | -- ******* ---- .> task events: ON
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    |  -------------- [queues]
cms-worker_1                    |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                    |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                    |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                    | 
cms-worker_1                    | [tasks]
cms-worker_1                    |   . celery_utils.tasks.mark_resolved
cms-worker_1                    |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                    |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                    |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                    |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                    |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                    |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                    |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                    |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                    |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                    |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                    |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                    |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                    |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                    |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                    |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                    |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                    |   . super_csv.mixins.do_deferred_commit
cms-worker_1                    |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                    | 
cms-worker_1                    | [2022-10-11 22:14:15,678: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                    | [2022-10-11 22:14:20,690: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                    | [2022-10-11 22:14:21,706: INFO/MainProcess] mingle: sync with 1 nodes
cms-worker_1                    | [2022-10-11 22:14:21,710: INFO/MainProcess] mingle: sync complete
cms-worker_1                    | [2022-10-11 22:14:21,741: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.
cms-worker_1                    | [2022-10-11 22:14:21,746: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.export_olx[aadea90c-ec1d-441c-a78f-10c570044c15] received
cms-worker_1                    | [2022-10-11 22:14:25,362: INFO/MainProcess] sync with celery@edx.lms.core.default.%439d57b0205f
cms-worker_1                    | [2022-10-11 22:14:29,672: INFO/MainProcess] Task cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[268b9e5b-a7fc-42ae-82b5-2132877cc352] received
cms-worker_1                    | [2022-10-11 22:14:29,672: INFO/ForkPoolWorker-3] Task cms.djangoapps.contentstore.tasks.export_olx[aadea90c-ec1d-441c-a78f-10c570044c15] succeeded in 7.872583343996666s: None
cms-worker_1                    | [2022-10-11 22:14:29,917: INFO/ForkPoolWorker-3] cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[268b9e5b-a7fc-42ae-82b5-2132877cc352]: Task complete email has been sent to User x@y.com

If there is a race condition, then with which container? My guess is that there would be a race condition with Redis. To verify this, once you are in a functional state, restart both Redis and the cms worker: tutor local restart redis cms-worker. Do you end up in a broken state?

In a broken state, what is the output of tutor local logs --tail=100 redis?

Yes, the system is now broken again.

Redis logs:

tutor@production:~$ tutor local logs --tail=100 redis
docker-compose -f /home/tutor/.local/share/tutor/env/local/docker-compose.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local logs --tail 100 redis
Attaching to tutor_local_redis_1
redis_1                         | 1540:C 17 Oct 2022 08:00:38.191 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:00:38.225 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:05:39.032 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:05:39.043 * Background saving started by pid 1541
redis_1                         | 1541:C 17 Oct 2022 08:05:45.200 * DB saved on disk
redis_1                         | 1541:C 17 Oct 2022 08:05:45.214 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:05:45.279 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:10:46.097 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:10:46.107 * Background saving started by pid 1542
redis_1                         | 1542:C 17 Oct 2022 08:10:51.402 * DB saved on disk
redis_1                         | 1542:C 17 Oct 2022 08:10:51.415 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:10:51.439 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:15:52.082 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:15:52.095 * Background saving started by pid 1543
redis_1                         | 1543:C 17 Oct 2022 08:15:58.097 * DB saved on disk
redis_1                         | 1543:C 17 Oct 2022 08:15:58.112 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:15:58.138 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:20:59.052 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:20:59.065 * Background saving started by pid 1544
redis_1                         | 1544:C 17 Oct 2022 08:21:04.817 * DB saved on disk
redis_1                         | 1544:C 17 Oct 2022 08:21:04.830 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:21:04.915 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:26:05.011 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:26:05.026 * Background saving started by pid 1545
redis_1                         | 1545:C 17 Oct 2022 08:26:11.200 * DB saved on disk
redis_1                         | 1545:C 17 Oct 2022 08:26:11.218 * RDB: 3 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:26:11.282 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:31:12.075 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:31:12.085 * Background saving started by pid 1546
redis_1                         | 1546:C 17 Oct 2022 08:31:17.493 * DB saved on disk
redis_1                         | 1546:C 17 Oct 2022 08:31:17.506 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:31:17.522 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:36:18.076 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:36:18.088 * Background saving started by pid 1547
redis_1                         | 1547:C 17 Oct 2022 08:36:23.456 * DB saved on disk
redis_1                         | 1547:C 17 Oct 2022 08:36:23.475 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:36:23.519 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:41:24.054 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:41:24.064 * Background saving started by pid 1548
redis_1                         | 1548:C 17 Oct 2022 08:41:29.303 * DB saved on disk
redis_1                         | 1548:C 17 Oct 2022 08:41:29.315 * RDB: 3 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:41:29.397 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:46:30.064 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:46:30.074 * Background saving started by pid 1549
redis_1                         | 1549:C 17 Oct 2022 08:46:35.634 * DB saved on disk
redis_1                         | 1549:C 17 Oct 2022 08:46:35.647 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:46:35.719 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:51:36.098 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:51:36.111 * Background saving started by pid 1550
redis_1                         | 1550:C 17 Oct 2022 08:51:41.199 * DB saved on disk
redis_1                         | 1550:C 17 Oct 2022 08:51:41.211 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:51:41.245 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 08:56:42.024 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 08:56:42.042 * Background saving started by pid 1551
redis_1                         | 1551:C 17 Oct 2022 08:56:47.326 * DB saved on disk
redis_1                         | 1551:C 17 Oct 2022 08:56:47.340 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 08:56:47.375 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 09:01:48.050 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:01:48.064 * Background saving started by pid 1552
redis_1                         | 1552:C 17 Oct 2022 09:01:53.227 * DB saved on disk
redis_1                         | 1552:C 17 Oct 2022 09:01:53.239 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:01:53.298 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 09:06:54.063 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:06:54.075 * Background saving started by pid 1553
redis_1                         | 1553:C 17 Oct 2022 09:07:00.243 * DB saved on disk
redis_1                         | 1553:C 17 Oct 2022 09:07:00.264 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:07:00.322 * Background saving terminated with success
redis_1                         | 1:M 17 Oct 2022 09:12:01.087 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:12:01.099 * Background saving started by pid 1554
redis_1                         | 1554:C 17 Oct 2022 09:12:07.178 * DB saved on disk
redis_1                         | 1554:C 17 Oct 2022 09:12:07.194 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:12:07.241 * Background saving terminated with success
redis_1                         | 1:signal-handler (1665998184) Received SIGTERM scheduling shutdown...
redis_1                         | 1:M 17 Oct 2022 09:16:24.918 # User requested shutdown...
redis_1                         | 1:M 17 Oct 2022 09:16:24.918 * Calling fsync() on the AOF file.
redis_1                         | 1:M 17 Oct 2022 09:16:24.927 * Saving the final RDB snapshot before exiting.
redis_1                         | 1:M 17 Oct 2022 09:16:30.529 * DB saved on disk
redis_1                         | 1:M 17 Oct 2022 09:16:30.529 * Removing the pid file.
redis_1                         | 1:M 17 Oct 2022 09:16:30.529 # Redis is now ready to exit, bye bye...
redis_1                         | 1:C 17 Oct 2022 09:16:31.557 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                         | 1:C 17 Oct 2022 09:16:31.557 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                         | 1:C 17 Oct 2022 09:16:31.557 # Configuration loaded
redis_1                         | 1:M 17 Oct 2022 09:16:31.559 * monotonic clock: POSIX clock_gettime
redis_1                         | 1:M 17 Oct 2022 09:16:31.562 * Running mode=standalone, port=6379.
redis_1                         | 1:M 17 Oct 2022 09:16:31.562 # Server initialized
redis_1                         | 1:M 17 Oct 2022 09:16:31.562 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * Reading RDB preamble from AOF file...
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * Loading RDB produced by version 6.2.1
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * RDB age 6305295 seconds
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * RDB memory usage when created 280.22 Mb
redis_1                         | 1:M 17 Oct 2022 09:16:31.566 * RDB has an AOF tail
redis_1                         | 1:M 17 Oct 2022 09:16:35.119 # Done loading RDB, keys loaded: 44250, keys expired: 0.
redis_1                         | 1:M 17 Oct 2022 09:16:35.119 * Reading the remaining AOF tail...
redis_1                         | 1:M 17 Oct 2022 09:17:26.066 * DB loaded from append only file: 54.504 seconds
redis_1                         | 1:M 17 Oct 2022 09:17:26.066 * Ready to accept connections
redis_1                         | 1:M 17 Oct 2022 09:17:32.011 * 10000 changes in 60 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:17:32.024 * Background saving started by pid 13
redis_1                         | 13:C 17 Oct 2022 09:17:37.715 * DB saved on disk
redis_1                         | 13:C 17 Oct 2022 09:17:37.731 * RDB: 2 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:17:37.765 * Background saving terminated with success

CMS worker logs:

tutor@production:~$ tutor local logs --tail=100 cms-worker
docker-compose -f /home/tutor/.local/share/tutor/env/local/docker-compose.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/tutor/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local logs --tail 100 cms-worker
Attaching to tutor_local_cms-worker_1
cms-worker_1                    | - ** ---------- .> results:     
cms-worker_1                    | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                    | -- ******* ---- .> task events: ON
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    |  -------------- [queues]
cms-worker_1                    |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                    |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                    |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                    | 
cms-worker_1                    | [tasks]
cms-worker_1                    |   . celery_utils.tasks.mark_resolved
cms-worker_1                    |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                    |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                    |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                    |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                    |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                    |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                    |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                    |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                    |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                    |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                    |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                    |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                    |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                    |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                    |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                    |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                    |   . super_csv.mixins.do_deferred_commit
cms-worker_1                    |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:16:59,345: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 2.00 seconds... (1/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:01,351: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 4.00 seconds... (2/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:05,374: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 6.00 seconds... (3/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:11,386: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 8.00 seconds... (4/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:19,409: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                    | Trying again in 10.00 seconds... (5/100)
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:17:29,433: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                    | [2022-10-17 09:17:34,444: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                    | [2022-10-17 09:17:35,485: INFO/MainProcess] mingle: sync with 2 nodes
cms-worker_1                    | [2022-10-17 09:17:35,490: INFO/MainProcess] mingle: sync complete
cms-worker_1                    | [2022-10-17 09:17:35,550: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.

Redis logs after docker restart <cms worker id>:

redis_1                         | 1:M 17 Oct 2022 09:22:38.069 * 10 changes in 300 seconds. Saving...
redis_1                         | 1:M 17 Oct 2022 09:22:38.080 * Background saving started by pid 14
redis_1                         | 14:C 17 Oct 2022 09:22:43.624 * DB saved on disk
redis_1                         | 14:C 17 Oct 2022 09:22:43.637 * RDB: 1 MB of memory used by copy-on-write
redis_1                         | 1:M 17 Oct 2022 09:22:43.723 * Background saving terminated with success

CMS logs after cms worker restart:

cms-worker_1                    |  -------------- celery@edx.cms.core.default.%7a28f7644e28 v5.2.6 (dawn-chorus)
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    | -- ******* ---- Linux-5.15.54-x86_64-with-glibc2.29 2022-10-17 09:22:57
cms-worker_1                    | - *** --- * --- 
cms-worker_1                    | - ** ---------- [config]
cms-worker_1                    | - ** ---------- .> app:         proj:0x7fa59f9f4ac0
cms-worker_1                    | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                    | - ** ---------- .> results:     
cms-worker_1                    | - *** --- * --- .> concurrency: 6 (prefork)
cms-worker_1                    | -- ******* ---- .> task events: ON
cms-worker_1                    | --- ***** ----- 
cms-worker_1                    |  -------------- [queues]
cms-worker_1                    |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                    |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                    |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                    | 
cms-worker_1                    | [tasks]
cms-worker_1                    |   . celery_utils.tasks.mark_resolved
cms-worker_1                    |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                    |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                    |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                    |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                    |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                    |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                    |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                    |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                    |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                    |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                    |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                    |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                    |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                    |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                    |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                    |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                    |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                    |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                    |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                    |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                    |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                    |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                    |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                    |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                    |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                    |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                    |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                    |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                    |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                    |   . super_csv.mixins.do_deferred_commit
cms-worker_1                    |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                    | 
cms-worker_1                    | [2022-10-17 09:23:05,557: INFO/MainProcess] Connected to redis://redis:6379/0
cms-worker_1                    | [2022-10-17 09:23:05,567: INFO/MainProcess] mingle: searching for neighbors
cms-worker_1                    | [2022-10-17 09:23:06,593: INFO/MainProcess] mingle: sync with 2 nodes
cms-worker_1                    | [2022-10-17 09:23:06,596: INFO/MainProcess] mingle: sync complete
cms-worker_1                    | [2022-10-17 09:23:06,641: INFO/MainProcess] celery@edx.cms.core.default.%7a28f7644e28 ready.
cms-worker_1                    | [2022-10-17 09:23:06,646: INFO/MainProcess] Task lms.djangoapps.grades.tasks.compute_all_grades_for_course[5d72f98b-99b3-402f-bca4-e065524526f2] received
cms-worker_1                    | [2022-10-17 09:23:06,652: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.export_olx[2b8bda19-5c2a-43e6-ac39-0c7ea0e6dbe3] received
cms-worker_1                    | [2022-10-17 09:23:06,655: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79] received
cms-worker_1                    | [2022-10-17 09:23:06,658: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task[c21ec78d-151e-4c4c-9682-824e5507338e] received
cms-worker_1                    | [2022-10-17 09:23:06,666: INFO/ForkPoolWorker-4] cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79]: Attempting to register exams for course course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,667: INFO/ForkPoolWorker-4] cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79]: Successfully registered exams for course course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,667: INFO/ForkPoolWorker-4] cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79]: Publishing course course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,668: INFO/MainProcess] Task cms.djangoapps.contentstore.tasks.update_search_index[1a2b9790-aa58-4765-b4fa-bd09ac1f4077] received
cms-worker_1                    | [2022-10-17 09:23:06,677: INFO/MainProcess] Task openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task[b4561790-e99d-44f1-a65d-2ab6fc358a06] received
cms-worker_1                    | [2022-10-17 09:23:06,685: INFO/MainProcess] Task openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache[8f89fc2a-0ab1-406f-9692-7205252d7861] received
cms-worker_1                    | [2022-10-17 09:23:06,689: INFO/MainProcess] Task openedx.core.djangoapps.course_apps.tasks.update_course_apps_status[8875d9b9-9ff6-4f11-b547-c13bdf19e454] received
cms-worker_1                    | [2022-10-17 09:23:06,696: INFO/MainProcess] Task lms.djangoapps.discussion.tasks.update_discussions_map[1a63e555-7608-47d3-bff3-aa1bfff3c8a9] received
cms-worker_1                    | [2022-10-17 09:23:06,696: INFO/ForkPoolWorker-5] Updating discussion settings for course: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,747: INFO/ForkPoolWorker-4] Task cms.djangoapps.contentstore.tasks.update_special_exams_and_publish[7df8fa46-e879-46e6-a372-0b1d3585ca79] succeeded in 0.09090760699473321s: None
cms-worker_1                    | [2022-10-17 09:23:06,754: WARNING/ForkPoolWorker-3] Grades: Flag 'grades.enforce_freeze_grade_after_course_end' accessed without a request, which is likely in the context of a celery task.
cms-worker_1                    | [2022-10-17 09:23:06,757: INFO/ForkPoolWorker-4] Starting XBlockCaches update for course_key: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,767: INFO/ForkPoolWorker-3] Attempted compute_all_grades_for_course for course 'course-v1:Totem+BOT101+course', but grades are frozen.
cms-worker_1                    | [2022-10-17 09:23:06,770: INFO/ForkPoolWorker-3] Task lms.djangoapps.grades.tasks.compute_all_grades_for_course[5d72f98b-99b3-402f-bca4-e065524526f2] succeeded in 0.11385197890922427s: None
cms-worker_1                    | [2022-10-17 09:23:06,776: INFO/ForkPoolWorker-3] openedx.core.djangoapps.course_apps.tasks.update_course_apps_status[8875d9b9-9ff6-4f11-b547-c13bdf19e454]: Caching course apps status for course with id: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:06,800: INFO/ForkPoolWorker-2] HEAD http://elasticsearch:9200/courseware_content [status:200 request:0.010s]
cms-worker_1                    | [2022-10-17 09:23:06,944: INFO/ForkPoolWorker-2] HEAD http://elasticsearch:9200/course_info [status:200 request:0.005s]
cms-worker_1                    | [2022-10-17 09:23:06,985: INFO/ForkPoolWorker-2] GET http://elasticsearch:9200/course_info/_mapping [status:200 request:0.003s]
cms-worker_1                    | [2022-10-17 09:23:07,012: INFO/ForkPoolWorker-2] POST http://elasticsearch:9200/_bulk [status:200 request:0.021s]
cms-worker_1                    | [2022-10-17 09:23:08,057: INFO/ForkPoolWorker-2] GET http://elasticsearch:9200/courseware_content/_mapping [status:200 request:0.003s]
cms-worker_1                    | [2022-10-17 09:23:08,116: INFO/ForkPoolWorker-2] POST http://elasticsearch:9200/_bulk [status:200 request:0.035s]
cms-worker_1                    | [2022-10-17 09:23:08,130: INFO/ForkPoolWorker-2] POST http://elasticsearch:9200/courseware_content/_search [status:200 request:0.011s]
cms-worker_1                    | [2022-10-17 09:23:08,132: INFO/ForkPoolWorker-2] Task cms.djangoapps.contentstore.tasks.update_search_index[1a2b9790-aa58-4765-b4fa-bd09ac1f4077] succeeded in 1.4569336449494585s: None
cms-worker_1                    | [2022-10-17 09:23:08,182: INFO/ForkPoolWorker-2] Task lms.djangoapps.discussion.tasks.update_discussions_map[1a63e555-7608-47d3-bff3-aa1bfff3c8a9] succeeded in 0.04591151303611696s: None
cms-worker_1                    | [2022-10-17 09:23:11,919: INFO/ForkPoolWorker-5] Updating existing discussion topic links for course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:11,925: INFO/ForkPoolWorker-5] Creating new discussion topic links for course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:11,931: INFO/ForkPoolWorker-5] Responses of the Open edX Event <org.openedx.learning.discussions.configuration.changed.v1>: 
cms-worker_1                    | [(<function handle_course_discussion_config_update at 0x7fa5728d1f70>, None)]
cms-worker_1                    | [2022-10-17 09:23:11,933: INFO/ForkPoolWorker-5] Task openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task[b4561790-e99d-44f1-a65d-2ab6fc358a06] succeeded in 5.246437182999216s: None
cms-worker_1                    | [2022-10-17 09:23:11,965: INFO/ForkPoolWorker-1] Replacing CourseOutline for course-v1:Totem+TP_SD_FR+01 (version 634d11d481aec31389a88f93, 12 sequences)
cms-worker_1                    | [2022-10-17 09:23:11,978: INFO/ForkPoolWorker-1] Found CourseContext for course-v1:Totem+TP_SD_FR+01, updating...
cms-worker_1                    | [2022-10-17 09:23:12,042: WARNING/ForkPoolWorker-3] Flag 'course_live.enable_course_live' accessed without a request, which is likely in the context of a celery task.
cms-worker_1                    | [2022-10-17 09:23:12,055: WARNING/ForkPoolWorker-3] Flag 'teams.enable_teams_app' accessed without a request, which is likely in the context of a celery task.
cms-worker_1                    | [2022-10-17 09:23:12,086: INFO/ForkPoolWorker-3] Task openedx.core.djangoapps.course_apps.tasks.update_course_apps_status[8875d9b9-9ff6-4f11-b547-c13bdf19e454] succeeded in 5.311822294024751s: None
cms-worker_1                    | [2022-10-17 09:23:12,138: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@vertical+block@073c8cdf616f4dbe9035bc76d3415a78
cms-worker_1                    | [2022-10-17 09:23:12,143: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@html+block@9b3b90445210463fa900218b9d408be5
cms-worker_1                    | [2022-10-17 09:23:12,146: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@vertical+block@ba7bb75754b14485a5153fdee4808d1c
cms-worker_1                    | [2022-10-17 09:23:12,148: INFO/ForkPoolWorker-4] Updating XBlockCache with usage_key: block-v1:Totem+TP_SD_FR+01+type@html+block@68e2796c9a4243b88a531525e7a9ed27
cms-worker_1                    | [2022-10-17 09:23:12,179: INFO/ForkPoolWorker-4] Ending XBlockCaches update for course_key: course-v1:Totem+TP_SD_FR+01
cms-worker_1                    | [2022-10-17 09:23:12,182: INFO/ForkPoolWorker-4] Task openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache[8f89fc2a-0ab1-406f-9692-7205252d7861] succeeded in 5.425541295087896s: None
cms-worker_1                    | [2022-10-17 09:23:12,206: INFO/ForkPoolWorker-1] Task cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task[c21ec78d-151e-4c4c-9682-824e5507338e] succeeded in 5.535299669019878s: None
cms-worker_1                    | [2022-10-17 09:23:17,800: INFO/MainProcess] Task cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[aa284a42-f396-4927-aa9d-c71b6e5f4290] received
cms-worker_1                    | [2022-10-17 09:23:17,801: INFO/ForkPoolWorker-6] Task cms.djangoapps.contentstore.tasks.export_olx[2b8bda19-5c2a-43e6-ac39-0c7ea0e6dbe3] succeeded in 11.094052562024444s: None
cms-worker_1                    | [2022-10-17 09:23:17,963: INFO/ForkPoolWorker-3] cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[aa284a42-f396-4927-aa9d-c71b6e5f4290]: Task complete email has been sent to User x@y.com
cms-worker_1                    | [2022-10-17 09:23:17,966: INFO/ForkPoolWorker-3] Task cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email[aa284a42-f396-4927-aa9d-c71b6e5f4290] succeeded in 0.16261423006653786s: None

In a broken state, what are the logs coming from the cairn-superset-worker container?

tutor local logs --tail=100 cairn-superset-worker

Might be on to something here:

Logs before “breaking”:

cairn-superset-worker_1         | [2022-10-17 10:16:00,065: INFO/MainProcess] Received task: reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e]  
cairn-superset-worker_1         | [2022-10-17 10:16:00,108: INFO/ForkPoolWorker-3] Task reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e] succeeded in 0.040656305965967476s: None

Logs after “breaking”:

cairn-superset-worker_1         | [2022-10-17 10:18:01,290: INFO/MainProcess] missed heartbeat from celery@edx.lms.core.default.%439d57b0205f
cairn-superset-worker_1         | [2022-10-17 10:18:01,292: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
cairn-superset-worker_1         | Traceback (most recent call last):
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 706, in send_packed_command
cairn-superset-worker_1         |     sendall(self._sock, item)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/_compat.py", line 9, in sendall
cairn-superset-worker_1         |     return sock.sendall(*args, **kwargs)
cairn-superset-worker_1         | BrokenPipeError: [Errno 32] Broken pipe
cairn-superset-worker_1         | 
cairn-superset-worker_1         | During handling of the above exception, another exception occurred:
cairn-superset-worker_1         | 
cairn-superset-worker_1         | Traceback (most recent call last):
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 318, in start
cairn-superset-worker_1         |     blueprint.start(self)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/bootsteps.py", line 119, in start
cairn-superset-worker_1         |     step.start(parent)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/worker/consumer/consumer.py", line 599, in start
cairn-superset-worker_1         |     c.loop(*c.loop_args())
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/celery/worker/loops.py", line 83, in asynloop
cairn-superset-worker_1         |     next(loop)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/asynchronous/hub.py", line 364, in create_loop
cairn-superset-worker_1         |     cb(*cbargs)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 1083, in on_readable
cairn-superset-worker_1         |     self.cycle.on_readable(fileno)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 354, in on_readable
cairn-superset-worker_1         |     chan.handlers[type]()
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 688, in _receive
cairn-superset-worker_1         |     ret.append(self._receive_one(c))
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/kombu/transport/redis.py", line 698, in _receive_one
cairn-superset-worker_1         |     response = c.parse_response()
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3501, in parse_response
cairn-superset-worker_1         |     self.check_health()
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/client.py", line 3522, in check_health
cairn-superset-worker_1         |     check_health=False)
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 726, in send_command
cairn-superset-worker_1         |     check_health=kwargs.get('check_health', True))
cairn-superset-worker_1         |   File "/usr/local/lib/python3.7/site-packages/redis/connection.py", line 718, in send_packed_command
cairn-superset-worker_1         |     (errno, errmsg))
cairn-superset-worker_1         | redis.exceptions.ConnectionError: Error 32 while writing to socket. Broken pipe.
cairn-superset-worker_1         | [2022-10-17 10:18:01,317: INFO/MainProcess] Connected to redis://redis:6379/0
cairn-superset-worker_1         | [2022-10-17 10:18:01,329: INFO/MainProcess] mingle: searching for neighbors
cairn-superset-worker_1         | [2022-10-17 10:18:02,338: INFO/MainProcess] mingle: all alone
cairn-superset-worker_1         | [2022-10-17 10:18:02,373: INFO/MainProcess] Received task: reports.scheduler[a3f8ca63-297a-48ab-96f2-196521334496]  
cairn-superset-worker_1         | [2022-10-17 10:18:02,412: INFO/ForkPoolWorker-3] Task reports.scheduler[a3f8ca63-297a-48ab-96f2-196521334496] succeeded in 0.032858264981769025s: None
cairn-superset-worker_1         | [2022-10-17 10:18:12,073: INFO/MainProcess] sync with celery@edx.cms.core.default.%7a28f7644e28
cairn-superset-worker_1         | [2022-10-17 10:19:00,037: INFO/MainProcess] Received task: reports.scheduler[b03a01c6-fe4f-42a2-8301-49396b539c7b]  
cairn-superset-worker_1         | [2022-10-17 10:19:00,063: INFO/ForkPoolWorker-3] Task reports.scheduler[b03a01c6-fe4f-42a2-8301-49396b539c7b] succeeded in 0.023648836999200284s: None
cairn-superset-worker_1         | [2022-10-17 10:16:00,065: INFO/MainProcess] Received task: reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e]  
cairn-superset-worker_1         | [2022-10-17 10:16:00,108: INFO/ForkPoolWorker-3] Task reports.scheduler[0f512dce-bd4a-4dee-9d9e-380a37b5267e] succeeded in 0.040656305965967476s: None

So it just had the one error, but seems to pick up OK after that… Interestingly it skipped 10:17, which is around the time I ran tutor local restart redis cms-worker

Yes, the cairn worker is failing because it is (incorrectly) configured to connect to the default Redis instance, instead of the cairn-redis container. Thus, the cairn worker is failing when we restart the redis container. The mingle: sync with 2 nodes log line tipped me off.

The superset and the cms workers should really not be “mingling”. Let’s stop the cairn worker and see whether this resolves the issue:

tutor local stop cairn-superset-worker
tutor local restart redis cms-worker

After these commands, are you in a failed state?

I ran those commands, the CMS entered the failing state again. Maybe because the other superset containers are still running? like the cairn redis?

I also do still see a cairn worker running, after running that command:

docker ps | grep cairn
f94d7124a2c9   overhangio/cairn-superset:14.0.1     "/bin/sh -c 'gunicor…"   5 days ago   Up 5 days (healthy)   8088/tcp                                                                             tutor_local_cairn-superset_1
1afd8d16dfed   overhangio/cairn-superset:14.0.1     "celery beat --app=s…"   5 days ago   Up 5 days             8088/tcp                                                                             tutor_local_cairn-superset-worker-beat_1
663cb123fa47   postgres:9.6-alpine                  "docker-entrypoint.s…"   5 days ago   Up 5 days             5432/tcp                                                                             tutor_local_cairn-postgresql_1
67655edc04d7   overhangio/cairn-clickhouse:14.0.1   "/entrypoint.sh"         5 days ago   Up 5 days             8123/tcp, 9000/tcp, 9009/tcp                                                         tutor_local_cairn-clickhouse_1
c3a7fa385ec5   timberio/vector:0.18.1-alpine        "/usr/local/bin/vect…"   5 days ago   Up 5 days                                                                                                  tutor_local_cairn-vector_1
fbab7fa123e5   redis:6.2.4-alpine                   "docker-entrypoint.s…"   5 days ago   Up 5 days             6379/tcp                                                                             tutor_local_cairn-redis_1

Right. Let’s stop it as well and attempt to restart the cms-worker:

tutor local stop cairn-superset-worker cairn-superset-worker-beat
tutor local restart redis cms-worker

At this point, are we still in a failed state?

OK, so I did not run the docker restart <cms worker id> first, I just ran those commands. It’s still broken.

By the way, thank you a lot for your help so far!