Hey dear all,
I am currently working with @alecar on the final feature for the Survey Report project.
The goal of this step is to make it possible for intances to send the report (aggregated and annonimized data) automatically every six months.
Naturally this can be turned off and configured for each instance, but the goal is to make it easy for intances to report the data to measure the growth of the project.
As it stands now, the report can be generated and sent by a superuser in the admin panel. Sending it automatically requires that we have some way of launching async tasks with some regularity.
Currently there are two main ways of scheduling tasks in the platform:
-
Celery Beat: adding djang_celery_beat as an additional dependency has been done. It was even backported to the koa and lilac branches.
-
Schedule tasks with an external scheduler like jenkins or crontab: this is the approach that has been favored by edx. It is also how we at edunext manage our largest instances.
During my research I found:
-
PRs where trying to add celery-beat became a general way of creating custom cron jobs via ansible.
-
Adding celery-beat is the standard for opencraftās groove
Now that we have mostly landed in a world where k8s is the way of hosting production grade instances and where tutor is the supported way of writing the manifests for said k8s clusters, Iād like to bring back the question of the scheduler to the forefront.
Specifically I would like to know
- why was celery beat not used in edx.org? What problems guided you in the direction of jenkins?
- would the core project be open to installing celery-beat now as a dependency ?
- if adding celery-beat is a no, what would the correct approach for smaller instances be?
Your insights and feedback will be greatly appreciated. Thank you in advance for your time and expertise in this matter.
Iām tagging some people that mostly guided those discussions in the past.