Tutor k8s severe error sometime

I upgraded to tutor 16.1.7 recently.
Sometimes I go to LMS and see, Server error with no additional messages, just a black screen with a Server error message.
Today, I found similar issue, but when I reload the page it become normal, I went into a course and kept seeing There is an error loading this course.
I knew right away, it must be one of the worker nodes had a problem.
I tried to check using kubectl get pods -n openedx
all pods were running fine.
I ran kubectl top nodes and all were good and in low usage.

I checked mysql and mongodb, they were both running fine with low usage.
I tried tutor k8s stop and then tutor k8s start but the issue remained.

I had to kill both worker nodes, it was very painful because of the pod affinity of caddy.
After that, everything worked fine.

I’m thinking the node volume was full, though I didn’t check, I had to fix the error asap, I gave it 20GB.

Does anyone have a similar issue? How do I effectively debug and fix this?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.