Completion aggregator tasks aren't executed by lms-worker

:slight_smile: hi everyone, this is my first post in this community and I need some help, please.

I’m facing an issue with completion-aggregator plugin and celery/lms-worker.
The question is that I have two environments, stage and production, and I have two different behaviour between them.

In stage, in the lms-worker container, when I execute python manage.py lms run_aggregator_service, Celery received the task and it execute it and the StaleCompletion for my user are resolved (I verify this in django admin). I can see the log that the task was received:

INFO 2024-03-22 14:36:38,215 [celery.worker.strategy] [140552876794496] strategy strategy.py.task_message_handler:157 - Received task: completion_aggregator.tasks.aggregation_tasks.update_aggregators[0a7dfe81-d21b-4a24-a1dc-830de853dc5c]

If I execute it again, I see the message: No StaleCompletions to process. Exiting.

But if I do the same in the production environment, I can’t see the log that Celery received the task.
Also I can execute python manage.py lms run_aggregator_service infinitely, because the StaleCompletions are never resolved. The strange this is I can see these logs when run_aggregator_service is execute it:

INFO|Task completion_aggregator.tasks.aggregation_tasks.update_aggregators[624ea8f8-879a-4e96-a234-20c87c8f71cd] submitted with arguments None, {'username': '...', 'course_key': '...', 'block_keys': [...], 'force': False}

Due these environment exists for different purposes, I thought that it could be a configuration or setting with distinct values that cause this issue.

I compared the settings related to completion-aggregator and are the same:

COMPLETION_AGGREGATOR_AGGREGATION_LOCK = True
COMPLETION_AGGREGATOR_AGGREGATION_LOCK_TIMEOUT_SECONDS = 1000
COMPLETION_AGGREGATOR_ASYNC_AGGREGATION = True
COMPLETION_AGGREGATOR_BLOCK_TYPES = {'sequential', 'vertical', 'course', 'chapter'}
COMPLETION_AGGREGATOR_CLEANUP_LOCK = True
COMPLETION_AGGREGATOR_CLEANUP_LOCK_TIMEOUT_SECONDS = 1000
COMPLETION_BY_VIEWING_DELAY_MS = 5000
COMPLETION_VIDEO_COMPLETE_PERCENTAGE = 0.95

I compared the settings related to Celery and are the same:

CELERYBEAT_SCHEDULE = {'refresh-saml-metadata': {'task': 'common.djangoapps.third_party_auth.fetch_saml_metadata', 'schedule': datetime.timedelta(days=1)}}  ###
CELERYBEAT_SCHEDULER = 'celery.beat:PersistentScheduler'  ###
CELERYD_HIJACK_ROOT_LOGGER = False  ###
CELERYD_PREFETCH_MULTIPLIER = 1  ###
CELERY_ALWAYS_EAGER = True  ###
CELERY_BROKER_HOSTNAME = 'redis:6379'  ###
CELERY_BROKER_PASSWORD = ''  ###
CELERY_BROKER_TRANSPORT = 'redis'  ###
CELERY_BROKER_USER = ''  ###
CELERY_BROKER_USE_SSL = False  ###
CELERY_BROKER_VHOST = '0'  ###
CELERY_CREATE_MISSING_QUEUES = True  ###
CELERY_DEFAULT_EXCHANGE = 'edx.lms.core'  ###
CELERY_DEFAULT_EXCHANGE_TYPE = 'direct'  ###
CELERY_DEFAULT_QUEUE = 'edx.lms.core.default'  ###
CELERY_DEFAULT_ROUTING_KEY = 'edx.lms.core.default'  ###
CELERY_EVENT_QUEUE_TTL = None  ###
CELERY_IGNORE_RESULT = False  ###
CELERY_IMPORTS = ('poll.tasks',)  ###
CELERY_MESSAGE_COMPRESSION = 'gzip'  ###
CELERY_QUEUES = {'edx.lms.core.high': {}, 'edx.lms.core.default': {}, 'edx.lms.core.high_mem': {}, 'edx.cms.core.default': {}}  ###
CELERY_QUEUE_HA_POLICY = 'all'  ###
CELERY_RESULT_BACKEND = 'django-cache'  ###
CELERY_RESULT_SERIALIZER = 'json'  ###
CELERY_ROUTES = 'openedx.core.lib.celery.routers.route_task'  ###
CELERY_SEND_EVENTS = True  ###
CELERY_SEND_TASK_SENT_EVENT = True  ###
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True  ###
CELERY_TASK_SERIALIZER = 'json'  ###
CELERY_TIMEZONE = 'UTC'  ###
CELERY_TRACK_STARTED = True  ###

In both environments the task was registered, celery -A lms.celery inspect registered:

completion_aggregator.tasks.aggregation_tasks.migrate_batch
completion_aggregator.tasks.aggregation_tasks.update_aggregators
completion_aggregator.tasks.handler_tasks.mark_all_stale

I checked EXPLICIT_QUEUES, and in none of the environments are the completion aggregator tasks, and in stage the task is working.

I read about CELERY_ALWAYS_EAGER setting, but in stage, isn’t define and the task is still working.

So I don’t know what else to check. Maybe you give a clue to what else I can check or compare between the environments.

Thank you all!

Hmm, if this is set to True in production I wonder if that’s your problem? Try explicitly setting it to false.

Otherwise, I’m not sure what to say. You’ve done a great job of debugging the issue so far, and you seem to have a pretty good understanding of what’s going on. Perhaps find some other celery task in the system like course export and see how that works on prod vs. stage?

Thanks Braden for reply!

I made a mistake with the Celery settings that I put in the post. Those settings comes from my local environment. These are from production and stage, CELERY_ALWAYS_EAGER is not defined (aggregator task still working in stage):

CELERYBEAT_SCHEDULE = {'refresh-saml-metadata': {'task': 'common.djangoapps.third_party_auth.fetch_saml_metadata', 'schedule': datetime.timedelta(days=1)}}  ###
CELERYBEAT_SCHEDULER = 'celery.beat:PersistentScheduler'  ###
CELERYD_HIJACK_ROOT_LOGGER = False  ###
CELERYD_PREFETCH_MULTIPLIER = 1  ###
CELERY_BROKER_HOSTNAME = 'redis:6379'  ###
CELERY_BROKER_PASSWORD = ''  ###
CELERY_BROKER_TRANSPORT = 'redis'  ###
CELERY_BROKER_USER = ''  ###
CELERY_BROKER_USE_SSL = False  ###
CELERY_BROKER_VHOST = '0'  ###
CELERY_CREATE_MISSING_QUEUES = True  ###
CELERY_DEFAULT_EXCHANGE = 'edx.lms.core'  ###
CELERY_DEFAULT_EXCHANGE_TYPE = 'direct'  ###
CELERY_DEFAULT_QUEUE = 'edx.lms.core.default'  ###
CELERY_DEFAULT_ROUTING_KEY = 'edx.lms.core.default'  ###
CELERY_EVENT_QUEUE_TTL = None  ###
CELERY_IGNORE_RESULT = False  ###
CELERY_IMPORTS = ('poll.tasks',)  ###
CELERY_MESSAGE_COMPRESSION = 'gzip'  ###
CELERY_QUEUES = {'edx.lms.core.high': {}, 'edx.lms.core.default': {}, 'edx.lms.core.high_mem': {}, 'edx.cms.core.default': {}}  ###
CELERY_QUEUE_HA_POLICY = 'all'  ###
CELERY_RESULT_BACKEND = 'django-cache'  ###
CELERY_RESULT_SERIALIZER = 'json'  ###
CELERY_ROUTES = 'openedx.core.lib.celery.routers.route_task'  ###
CELERY_SEND_EVENTS = True  ###
CELERY_SEND_TASK_SENT_EVENT = True  ###
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True  ###
CELERY_TASK_SERIALIZER = 'json'  ###
CELERY_TIMEZONE = 'UTC'  ###
CELERY_TRACK_STARTED = True  ###
ENV_CELERY_QUEUES = None  ###

Anyways, I can try to setting it to False and see what happen with other Celery task.

For example, for other feature, I created a task that is executed when the course_published signal is received and it works ok in both environment, I did’t have to add extra configuration.

Than you for the suggestions!