Performance Issues During Batch Enrollment in Open edX Quince Release

Hello Everyone,

We are currently running the Open edX Quince release in our production environment, using a single-server installation managed through Tutor (t2.xlarge EC2 instance). While the platform generally works well, we’ve encountered a significant issue during batch enrollment operations.

Our LMS is designed for institutions where self-enrollment is disabled, and all courses are invitation-only. Instructors frequently use the batch enrollment feature from the Instructor tab > Membership to enroll large groups of students (500+ per batch).

During this operation, the platform becomes unresponsive for all users. For example:

  • While a batch enrollment is in progress, if any user tries to access the site, the system does not respond or load as expected.
  • This significantly impacts the user experience, especially for institutions where batch enrollment is performed multiple times.

Given that this is a native/default feature of Open edX, the behavior is quite concerning and disruptive.

My Questions:

  1. Is there a way to optimize or prevent the site from becoming unresponsive during batch enrollment?
  2. Are there best practices or alternative methods for handling large-scale batch enrollments without affecting platform performance?
  3. Would moving to a multi-server or more scalable infrastructure mitigate this issue, or are there other tuning options we should explore?

Any insights, recommendations, or similar experiences shared by the community would be highly appreciated.

Thank you in advance for your support!

Hi @Mahendra,

From what I saw on AWS an t2.xlarge EC2 instance has 4 CPUs and 16GB of RAM.
From my experience it’s too small instance to deploy and have a responsive platform for multiple users.
But if you don’t have enough money to have more hardware, you can try to limit the amount of celery worker of the lms-worker and cms-worker, for example by adding the --concurrency=1 so only a worker per service would be started up instead of start a worker per CPU (default behavior of celery).
You would need to create a Tutor plugin that would add the --concurrency=1 to both LMS_WORKER_COMMAND and CMS_WORKER_COMMAND.
The consequence of adding it would be that you limite the amount on asynchornous tasks that you platform would execute, but they would be more performant for your online users. The batch enroll would take more time to complete; other impact is that the password recover process won’t work rapidly or the course progress won’t catch up rapidly after your learners has completed some tests/exams or your course certificates could take more time to complete.
To mitigate it you could create another Tutor plugin to add more lms-workers with --concurrency=1, one for each celery queue that edx-platform uses.

2 Likes

Hi @IvoBranco

Thank you for your suggestion. We have also scaled up resources, such as upgrading to t2.2xlarge instances. However, we’ve observed that the batch enrollment task is not being executed in the background using Celery. As a result, this task does not get routed to the lms-worker and remains unaffected by changes in the worker configuration.

This behavior means that until the batch enrollment task is completed, all other requests are blocked. Additionally, during this time, we’ve noticed minimal increases in CPU or RAM utilization.

Please let me know if you have any suggestions for it.

Thanks again.

the T instances are not optimized CPU heavy tasks, try the M or C instances to see if it’s better.
Or you can try to inspect the code and change it to split the batch into multiple smaller chunks and iterate through it with a delay between chunks.