hi everyone, this is my first post in this community and I need some help, please.
I’m facing an issue with completion-aggregator plugin and celery/lms-worker.
The question is that I have two environments, stage and production, and I have two different behaviour between them.
In stage, in the lms-worker container, when I execute python manage.py lms run_aggregator_service
, Celery received the task and it execute it and the StaleCompletion for my user are resolved (I verify this in django admin). I can see the log that the task was received:
INFO 2024-03-22 14:36:38,215 [celery.worker.strategy] [140552876794496] strategy strategy.py.task_message_handler:157 - Received task: completion_aggregator.tasks.aggregation_tasks.update_aggregators[0a7dfe81-d21b-4a24-a1dc-830de853dc5c]
If I execute it again, I see the message: No StaleCompletions to process. Exiting
.
But if I do the same in the production environment, I can’t see the log that Celery received the task.
Also I can execute python manage.py lms run_aggregator_service
infinitely, because the StaleCompletions are never resolved. The strange this is I can see these logs when run_aggregator_service is execute it:
INFO|Task completion_aggregator.tasks.aggregation_tasks.update_aggregators[624ea8f8-879a-4e96-a234-20c87c8f71cd] submitted with arguments None, {'username': '...', 'course_key': '...', 'block_keys': [...], 'force': False}
Due these environment exists for different purposes, I thought that it could be a configuration or setting with distinct values that cause this issue.
I compared the settings related to completion-aggregator and are the same:
COMPLETION_AGGREGATOR_AGGREGATION_LOCK = True
COMPLETION_AGGREGATOR_AGGREGATION_LOCK_TIMEOUT_SECONDS = 1000
COMPLETION_AGGREGATOR_ASYNC_AGGREGATION = True
COMPLETION_AGGREGATOR_BLOCK_TYPES = {'sequential', 'vertical', 'course', 'chapter'}
COMPLETION_AGGREGATOR_CLEANUP_LOCK = True
COMPLETION_AGGREGATOR_CLEANUP_LOCK_TIMEOUT_SECONDS = 1000
COMPLETION_BY_VIEWING_DELAY_MS = 5000
COMPLETION_VIDEO_COMPLETE_PERCENTAGE = 0.95
I compared the settings related to Celery and are the same:
CELERYBEAT_SCHEDULE = {'refresh-saml-metadata': {'task': 'common.djangoapps.third_party_auth.fetch_saml_metadata', 'schedule': datetime.timedelta(days=1)}} ###
CELERYBEAT_SCHEDULER = 'celery.beat:PersistentScheduler' ###
CELERYD_HIJACK_ROOT_LOGGER = False ###
CELERYD_PREFETCH_MULTIPLIER = 1 ###
CELERY_ALWAYS_EAGER = True ###
CELERY_BROKER_HOSTNAME = 'redis:6379' ###
CELERY_BROKER_PASSWORD = '' ###
CELERY_BROKER_TRANSPORT = 'redis' ###
CELERY_BROKER_USER = '' ###
CELERY_BROKER_USE_SSL = False ###
CELERY_BROKER_VHOST = '0' ###
CELERY_CREATE_MISSING_QUEUES = True ###
CELERY_DEFAULT_EXCHANGE = 'edx.lms.core' ###
CELERY_DEFAULT_EXCHANGE_TYPE = 'direct' ###
CELERY_DEFAULT_QUEUE = 'edx.lms.core.default' ###
CELERY_DEFAULT_ROUTING_KEY = 'edx.lms.core.default' ###
CELERY_EVENT_QUEUE_TTL = None ###
CELERY_IGNORE_RESULT = False ###
CELERY_IMPORTS = ('poll.tasks',) ###
CELERY_MESSAGE_COMPRESSION = 'gzip' ###
CELERY_QUEUES = {'edx.lms.core.high': {}, 'edx.lms.core.default': {}, 'edx.lms.core.high_mem': {}, 'edx.cms.core.default': {}} ###
CELERY_QUEUE_HA_POLICY = 'all' ###
CELERY_RESULT_BACKEND = 'django-cache' ###
CELERY_RESULT_SERIALIZER = 'json' ###
CELERY_ROUTES = 'openedx.core.lib.celery.routers.route_task' ###
CELERY_SEND_EVENTS = True ###
CELERY_SEND_TASK_SENT_EVENT = True ###
CELERY_STORE_ERRORS_EVEN_IF_IGNORED = True ###
CELERY_TASK_SERIALIZER = 'json' ###
CELERY_TIMEZONE = 'UTC' ###
CELERY_TRACK_STARTED = True ###
In both environments the task was registered, celery -A lms.celery inspect registered
:
completion_aggregator.tasks.aggregation_tasks.migrate_batch
completion_aggregator.tasks.aggregation_tasks.update_aggregators
completion_aggregator.tasks.handler_tasks.mark_all_stale
I checked EXPLICIT_QUEUES, and in none of the environments are the completion aggregator tasks, and in stage the task is working.
I read about CELERY_ALWAYS_EAGER setting, but in stage, isn’t define and the task is still working.
So I don’t know what else to check. Maybe you give a clue to what else I can check or compare between the environments.
Thank you all!