CMS and LMS alternately timing out (on migrated legacy local tutor) [Tutor]

Issue:
Initially the LMS is working ok but the CMS is timing out. After a stop & launch/start, this is reversed and now the CMS is working but LMS timing out.

Background: I’ve inherited a legacy production local tutor setup, that I’ve been tasked with migrating and upgrading.

Original one is at v14, with a handful of customized modules, custom plugin and custom theme. On separate domains for LMS and CMS, using Azure oauth, has to use an outgoing corporate http/s proxy,.

I migrated the db and content onto a test server, with local v15 then upgraded to v16 (both with similar behavior as described below), and nginx for tls-terminating reverse proxy.
The upgrade process so far wasn’t a smooth process, and required several tweaks and hacks to get this far, but eventually I did end up with a seemingly functional LMS and CMS on v16. sort of.
(The eventual goal is to get it onto current version, via further incremental tutor upgrades)

I’m seeing some odd and inconsistent behavior that I’m not sure how best to troubleshoot or where to look:
When the LMS is working, the CMS isn’t, and vice versa.
Not working being, browsing to the lms/cms hangs and eventually (after a minute, or whatever I set the timeout as in nginx) comes up with a 504 gateway timeout from nginx.

I can’t see anything relevant in the tutor logs, except for the access request itself in the caddy logs which only shows up after the timeout is reached. When the LMS or CMS is in this state of timing out, nothing relevant is logged in their respective lms/cms logs.

So what I’m seeing is this:

  • After the server is restarted, usually -but not always- the LMS is working ok, but the CMS is timing out. (Or the other way around).

  • Then when LMS is working ok, after a stop and launch/start (and no other changes), now the CMS is working, but the LMS is timing out.

  • When LMS isn’t working: From a shell in both the cms and lms container I can successfully do a curl http://cms:8000, whereas curl http://lms:8000 (from either. same when using its docker IP) just sits there and hangs.

Does this sound familiar to anyone?
Any clues as to where I should be looking, or how best to troubleshoot this?

Looks like this was due to resource constraints on the dev vm; Turns out uwsgi got OOM-killed. This issue disappeared after bumping memory from 8G to 16G.