TLDR
If your monitoring is seeing a bunch of django.db.utils:OperationalError
errors with the message "Deadlock found when trying to get lock;"
, it can often be fixed by using decorators to exempt particular views from Django’s implicit request-level transaction. This PR shows how to do this for both function and class-based views:
But please read the rest of this post before making a PR to fix one of these…
What’s actually happening?
Studio and the LMS run with ATOMIC_REQUESTS
enabled by default. This means that Django will implicitly open a database transaction at the start of a request and commit it at the end of request processing. But this can cause issues when slow, concurrent requests are trying to update the same row.
As an example, say a request spends most of its time processing and rendering courseware but also updates a row somewhere to record the most recent course this user was active in. It’s going slowly for some reason–maybe it’s a particularly expensive problem type, or maybe the system is just under load and being slow for other reasons. So the student hits reload or clicks a button a few extra times. Now there are multiple requests out there that are probably slow for the same underlying reason, but are now also trying to get lock on the same row that is being updated in the first request’s transaction (which hasn’t closed yet).
Why use ATOMIC_REQUESTS
at all?
We used it way back when because it was easy to enable, and it usually does the thing you expect–i.e. keeping you from shooting yourself in the foot with inconsistent state because your request died halfway through the data you’re updating.
What’s the alternative?
The Right Thing to Do is to use transactions manually, in a much more thoughtful way than we do generally. Our apps are pretty inconsistent about this, but some of the newer ones wrap important operations in explicit transactions.
Why not turn off ATOMIC_REQUESTS
altogether?
Because it’s quite likely things will break in non-obvious, slowly data-corrupting ways, and it’s difficult to set aside time to test that across the large set of code that is edx-platform and the other apps it installs.
So is it dangerous to allow specific views to be non-atomic?
You’ll want to examine the view to figure out what it’s actually deadlocking on (which will be in the error message). If it’s doing something important, like creating an enrollment, purchasing something, etc.– then yes, it can be dangerous to do so, and you’ll need to think it through carefully. But this more commonly shows up in things that are either only changing a single row, or where the result is just not that sensitive (e.g. “most recently used course”).