We stumbled upon the problem with graceful shutdown of Open edX services on
As you can know, when you try to stop EC2 instance, AWS sends
ACPI shutdown button press event to instance’s OS.
All services receive
SIGTERM then, and have only 30 seconds to finish all in-progress jobs.
After 30 seconds of waiting, AWS powers off instance, and this shutdown timeout is not configurable.
The problem is that Open edX services not always manage to finish in-progress tasks and AWS kills them.
For example, during last few redeployments we noticed that for some instances, the old PID files are not being cleaned up for the discussions service.
It means that some service blocks supervisor shutdown process for significant amount of time and forum doesn’t manage to shutdown gracefully and clean its
We didn’t manage to determine which service requires so much time to finish in our case, but
Celery, for example, can take any amount of time to finish in-progress task.
In other words, every
Celery task that runs more than 30 seconds is being killed when you stop EC2 instance during its execution. And that’s bad, because data, that is being processed by this task can be lost.
Do those who run their instances on
EC2 aware of this problem or, may be, handle this somehow?