Graceful shutdown on AWS EC2

0x29a · June 11, 2020, 6:22pm

We stumbled upon the problem with graceful shutdown of Open edX services on EC2.

As you can know, when you try to stop EC2 instance, AWS sends ACPI shutdown button press event to instance’s OS.
All services receive SIGTERM then, and have only 30 seconds to finish all in-progress jobs.
After 30 seconds of waiting, AWS powers off instance, and this shutdown timeout is not configurable.

The problem is that Open edX services not always manage to finish in-progress tasks and AWS kills them.
For example, during last few redeployments we noticed that for some instances, the old PID files are not being cleaned up for the discussions service.
It means that some service blocks supervisor shutdown process for significant amount of time and forum doesn’t manage to shutdown gracefully and clean its PID file.

We didn’t manage to determine which service requires so much time to finish in our case, but Celery, for example, can take any amount of time to finish in-progress task.

In other words, every Celery task that runs more than 30 seconds is being killed when you stop EC2 instance during its execution. And that’s bad, because data, that is being processed by this task can be lost.

Do those who run their instances on EC2 aware of this problem or, may be, handle this somehow?

Topic		Replies	Views
RabbitMQ config & compute_all_grades_for_course ETA are unable to work together, causing unrecoverable error on celery Development	0	191	July 21, 2023
Slow time for redirect during create a course or adding content Site Operations Help	1	461	February 10, 2020
Scaling Open edX on AWS Site Operators	1	1216	September 18, 2019
Concurrent users with AWS EC2 Tutor Help tutor	0	614	September 16, 2022
Problem with supervisorctl Site Operators	3	1154	January 18, 2020

Graceful shutdown on AWS EC2

Related topics