Hello everyone,
We conducted a whole lot of tests around this feature by installing django-celery-beat in a test environment deployed with Kubernetes.
We used OPENEDX_EXTRA_PIP_REQUIREMENTS in the Tutor config file to install django-celery-beat, a tutor-inline-plugin to add django-celery-beat to INSTALLED_APPS, and add new tasks to be executed in celery-beat.
Additionally, we needed to make a change in the k8s/deployments.yml file, adding the “–beat” flag to the arguments of the lms-workers deployment.
With this setup, we managed to have a functional development environment for Open edX using django-celery-beat to run cron jobs.
With this environment up and running, we performed different tests, including scaling the lms-workers to have more pods/instances and testing the uniqueness of the scheduled tasks (which was an issue in the past). We noticed sadly that django-celery-beat has not changed in this regard and still does not solve the problem of having multiple celery-beat instances coexisting at the same time. With multiple pods/instances started with the “–beat” flag, each one behaves as a scheduler and executes the scheduled tasks, resulting in task duplication. Same as it was before for edx.org.
In conclusion, using django-celery-beat requires us to have only one pod or instance running celery-beat at a time, so additional mechanisms need to be applied to ensure this.
There are some possible solutions for Open edX projects that have more than one pod/instance of the workers:
A: Add an external locking mechanism to ensure that only one of the workers is initialized with the “–beat” flag.
Pros: does not require tutor changes. Would be global to the project independently of the orchestration technology
Cons: it requires a new locking mechanism to be added to the core
B: Deploy celery-beat as a StatefulSet, which assigns unique identifiers to each pod and ensures that only one active instance of Celery Beat exists at all times.
Pros: easy to guarantee the uniqueness of the pod.
Cons: only applies for k8s. It will create a second worker pod even for instances that don’t require it.
C: Add a new Kubernetes deployment that starts an lms or cms worker using the “–beat” flag. This could be done using a Tutor plugin and managed through a flag in the Tutor configuration. However, workers started with “–beat” would still be able to execute tasks of normal workers, implying duplicated code and at least 2 workers running.
Pros: Does not require many changes to the current stack. It could be tested as a plugin and only then proposed for the tutor core.
Cons: it would be each person’s responsibility not to scale the number of pods for this. Only applies for k8s. Instances running with compose in prod would not have it.
D: Not using celery-beat. Instead use an external scheduler like Kubernetes CronJobs to centrally schedule tasks. This would allow us to execute tasks not only in LMS but also in other components, and we could manage it through a Tutor plugin.
Pros: from our testing. It’s easy to manage this way
Cons: only applies for k8s. Instances running with compose in prod would not have it.
We would like to hear your opinions and possible ideas about this implementation. Thank you in advance!