Cannot import or export classes (or run grading, or submit to problems)

I’m running Nutmeg and my VM on AWS became unresponsive over ssh on Sunday. I was forced to reboot it. I checked the filesystem and it wasn’t a problem that it had no space remaining or anything like that.

Since I rebooted, I am no longer able to run grade reports (they all just say “QUEUING” and never complete), I can’t import or export a class (it is permanently stuck on “Exporting” for exports even when I leave the page and come back, and “Unpacking” for imports), and when I test as a student account, I can’t submit answers to multiple choice problems, and completion xblocks that are marked done are persisted as marked done, but they’re not captured as done according to the course Progress page (completion xblocks are normally treated like 1-point problems.)

These symptoms tend to make me think perhaps there’s something broken about the mysql database (but not the mongodb where supposedly I read in the past completion xblocks’ information is stored)?

What is the best way to debug if there is a database error going on? (I haven’t been able to find anything interesting in the typical tutor local logs --tail=1000 output.)

Thus far I have tried the typical rebuilding of the docker container and restarting via tutor local quickstart but the error has persisted.

I also note that there is one error during “tutor images build openedx”

Step 22/103 : RUN curl -fsSL https://github.com/open-craft/edx-platform/commit/3d54f284f82b61e693ad652d8d6e46a226fcb36d.patch | git am
 ---> Running in f0ae43c43caf
curl: (22) The requested URL returned error: 404 

Logs grepped for “error”, with nothing obvious showing up to me:

tutor local logs --tail=1000 | grep -i error
cms-worker_1                 | [2023-02-27 18:33:19,383: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | [2023-02-27 18:33:21,398: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
cms-worker_1                 | [2023-02-27 18:33:25,419: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
lms-worker_1                 | 2023-02-27 18:33:19,937 ERROR 1 [celery.worker.consumer.consumer] [user None] [ip None] consumer.py:451 - consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
lms-worker_1                 | 2023-02-27 18:33:21,951 ERROR 1 [celery.worker.consumer.consumer] [user None] [ip None] consumer.py:451 - consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
lms-worker_1                 | 2023-02-27 18:33:25,959 ERROR 1 [celery.worker.consumer.consumer] [user None] [ip None] consumer.py:451 - consumer: Cannot connect to redis://redis:6379/0: Redis is loading the dataset in memory.
caddy_1                      | {"level":"error","ts":1677522807.8127453,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"1.1.1.1:58431","proto":"HTTP/2.0","method":"POST","host":"my.site.com","uri":"/courses/course-v1:MySite+Class1+V1/instructor/api/list_instructor_tasks","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":33.453857414,"size":0,"status":403}
caddy_1                      | {"level":"error","ts":1677522834.8444057,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"1.1.1.1:58431","proto":"HTTP/2.0","method":"POST","host":"my.site.com","uri":"/courses/course-v1:MySite+Class1+V1/instructor/api/list_instructor_tasks","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.594305346,"size":0,"status":403}
caddy_1                      | {"level":"error","ts":1677522894.3375554,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"1.1.1.1:58431","proto":"HTTP/2.0","method":"POST","host":"my.site.com","uri":"/courses/course-v1:MySite+Class1+V1/instructor/api/list_instructor_tasks","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.088728878,"size":0,"status":403}
caddy_1                      | {"level":"error","ts":1677522902.3015223,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"1.1.1.1:58431","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/enterprise/api/v1/enterprise-learner/?username=User1","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.250758609,"size":9504,"status":404}
caddy_1                      | {"level":"error","ts":1677522954.4726045,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"1.1.1.1:58431","proto":"HTTP/2.0","method":"POST","host":"my.site.com","uri":"/courses/course-v1:MySite+Class1+V1/instructor/api/list_instructor_tasks","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.219338132,"size":1019,"status":403}
elasticsearch_1              | {"type": "server", "timestamp": "2023-02-26T14:24:14,818Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openedx", "node.name": "e9c9df28b1e2", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-10523025246243657187, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx1g, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
caddy_1                      | {"level":"error","ts":1677523358.741058,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"3.3.3.3:56224","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/static/xblock/resources/done/public/check-full.png","tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.235543799,"size":9510,"status":404}
caddy_1                      | {"level":"error","ts":1677523363.6825814,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/api/user_tours/v1/User3","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.080648439,"size":0,"status":404}
caddy_1                      | {"level":"error","ts":1677523366.4745123,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/static/xblock/resources/done/public/check-empty.png","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.147255096,"size":9508,"status":404}
elasticsearch_1              | {"type": "server", "timestamp": "2023-02-26T14:51:20,285Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openedx", "node.name": "e9c9df28b1e2", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-17380630067652297396, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx1g, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
caddy_1                      | {"level":"error","ts":1677523369.1003125,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/apps.my.site.com/src/course-home/outline-tab/LmsHtmlFragment.css","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.160525264,"size":9508,"status":404}
caddy_1                      | {"level":"error","ts":1677523369.2665875,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/api/edx_proctoring/v1/user_onboarding/status?is_learning_mfe=true&course_id=course-v1%3AMySite%2BClass1%2BV1&username=User3","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.09508568,"size":67,"status":404}
caddy_1                      | {"level":"error","ts":1677523369.4550166,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/apps.my.site.com/src/course-home/outline-tab/LmsHtmlFragment.css","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.218113326,"size":9508,"status":404}
caddy_1                      | {"level":"error","ts":1677523369.5173433,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/api/user_tours/v1/User3","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.105177614,"size":0,"status":404}
caddy_1                      | {"level":"error","ts":1677523369.7331588,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50847","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/apps.my.site.com/src/course-home/outline-tab/LmsHtmlFragment.css","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.142734102,"size":9508,"status":404}
caddy_1                      | {"level":"error","ts":1677523370.826845,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","error":"http2: stream closed"}
caddy_1                      | {"level":"error","ts":1677523395.9024441,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:50882","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/apps.my.site.com/src/course-home/outline-tab/LmsHtmlFragment.css","tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.26288684,"size":8295,"status":404}
caddy_1                      | {"level":"error","ts":1677523397.7204595,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"3.3.3.3:56224","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/api/certificates/v0/certificates/User2/","tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.079329645,"size":63,"status":403}
elasticsearch_1              | {"type": "server", "timestamp": "2023-02-26T17:00:35,833Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openedx", "node.name": "e9c9df28b1e2", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-14550384431915640575, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx1g, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
caddy_1                      | {"level":"error","ts":1677523509.6227825,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"3.3.3.3:56224","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/api/certificates/v0/certificates/User2/","tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.148598795,"size":63,"status":403}
forum_1                      | /app/cs_comments_service/vendor/bundle/ruby/2.5.0/gems/unicorn-5.3.1/lib/unicorn/http_server.rb:205:in `pid=': Already running on PID:14 (or pid=/tmp/forum_unicorn.pid is stale) (ArgumentError)
forum_1                      | /app/cs_comments_service/vendor/bundle/ruby/2.5.0/gems/unicorn-5.3.1/lib/unicorn/http_server.rb:205:in `pid=': Already running on PID:14 (or pid=/tmp/forum_unicorn.pid is stale) (ArgumentError)
caddy_1                      | {"level":"error","ts":1677523589.4326928,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"3.3.3.3:56224","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/static/xblock/resources/done/public/check-empty.png","tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.160680944,"size":9510,"status":404}
elasticsearch_1              | {"type": "server", "timestamp": "2023-02-26T17:10:26,144Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openedx", "node.name": "e9c9df28b1e2", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-2847894554714149840, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx1g, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
forum_1                      | /app/cs_comments_service/vendor/bundle/ruby/2.5.0/gems/unicorn-5.3.1/lib/unicorn/http_server.rb:205:in `pid=': Already running on PID:14 (or pid=/tmp/forum_unicorn.pid is stale) (ArgumentError)
forum_1                      | /app/cs_comments_service/vendor/bundle/ruby/2.5.0/gems/unicorn-5.3.1/lib/unicorn/http_server.rb:205:in `pid=': Already running on PID:14 (or pid=/tmp/forum_unicorn.pid is stale) (ArgumentError)
elasticsearch_1              | {"type": "server", "timestamp": "2023-02-26T18:11:56,233Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openedx", "node.name": "e9c9df28b1e2", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-8115212296896162819, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx1g, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
caddy_1                      | {"level":"error","ts":1677523803.8927326,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:51168","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/static/xblock/resources/done/public/check-empty.png","tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.153876993,"size":9508,"status":404}
elasticsearch_1              | {"type": "server", "timestamp": "2023-02-27T18:32:51,680Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "openedx", "node.name": "e9c9df28b1e2", "message": "JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms1g, -Xmx1g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-12070135537405910665, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=data, -XX:ErrorFile=logs/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -Des.cgroups.hierarchy.override=/, -Xms1g, -Xmx1g, -XX:MaxDirectMemorySize=536870912, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/usr/share/elasticsearch/config, -Des.distribution.flavor=default, -Des.distribution.type=docker, -Des.bundled_jdk=true]" }
caddy_1                      | {"level":"error","ts":1677524030.795671,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_addr":"2.2.2.2:51168","proto":"HTTP/2.0","method":"GET","host":"my.site.com","uri":"/static/xblock/resources/done/public/check-full.png","tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","proto_mutual":true,"server_name":"my.site.com"}},"user_id":"","duration":0.232267946,"size":9508,"status":404}

One thing I noticed in the logs around hits for “grade_course” on the failing server is this:

2023-02-27 21:35:48,938 INFO 7 [openedx.core.djangoapps.cors_csrf.helpers] [user 3] [ip 1.2.3.4] helpers.py:64 - Origin 'https://my.site.com' was not in `CORS_ORIGIN_WHITELIST`; full referer was 'https://my.site.com/courses/course-v1:MyOrg+MyClass+V1/instructor' and requested host was 'my.site.com'; CORS_ORIGIN_ALLOW_ALL=False

On a working server, I don’t see that. Could that have anything to do with it? (I don’t see anything set under the working site’s admin/cors_csrf/ page though FWIW, and I’ve never set that anywhere AFAIK before.

A sub-question of the “What is the best way to debug if there is a database error going on?” question would be, because the requests to gather grade reports are pending tasks, how can I track where the initial pending task is put (e.g. based on the Task ID column that is printed out), and what code attempts to take them off the pending task list and process them?

@regis I think I may be experiencing the same issue you attempted to help @MaartenGH with on this thread. Like him I couldn’t send emails, import/export classes anymore etc.

First of all, I have more than 2x the redis data than he did:

ls -lh $(tutor config printroot)/data/redis
total 8.7G
-rw-r--r-- 1 ubuntu ubuntu 7.3G Feb 28 20:12 appendonly.aof
-rw-r--r-- 1 ubuntu ubuntu 833M Feb 28 20:09 dump.rdb
-rw-r--r-- 1 ubuntu ubuntu 572M Feb 28 03:08 temp-1.rdb

Second of all, when I run

tutor local stop redis
sudo mv $(tutor config printroot)/data/redis $(tutor config printroot)/data/redis.backup
tutor local restart redis cms-worker

it seems like the redis connection never comes back up:

ubuntu@ip-172-31-4-22:~$ tutor local restart redis cms-worker
docker-compose -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/dev/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/dev/docker-compose.tmp.yml --project-name tutor_dev stop
docker-compose -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local restart redis cms-worker
Restarting tutor_local_cms-worker_1 ... done
Restarting tutor_local_redis_1      ... done
ubuntu@ip-172-31-4-22:~$ tutor local logs --tail=100 -f | grep -v caddy
{removed due to size limit}
redis-permissions_1          | /openedx/redis/data already owned by 1000
redis-permissions_1          | /openedx/redis/data already owned by 1000
redis-permissions_1          | /openedx/redis/data already owned by 1000
redis_1                      | 1:C 28 Feb 2023 20:18:28.801 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:18:28.802 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:18:28.802 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:18:28.802 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:18:28.803 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:18:54.869 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:18:54.869 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:18:54.869 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:18:54.870 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:18:54.870 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:19:46.533 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:19:46.533 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:19:46.533 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:19:46.534 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:19:46.534 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:20:46.979 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:20:46.979 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:20:46.979 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:20:46.980 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:20:46.980 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:13.868 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:13.868 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:13.868 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:13.869 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:13.869 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:14.498 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:14.498 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:14.498 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:14.498 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:14.498 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:15.237 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:15.237 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:15.237 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:15.238 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:15.238 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:16.609 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:16.609 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:16.609 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:16.611 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:16.611 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:18.127 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:18.144 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:18.144 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:18.146 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:18.148 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:20.408 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:20.408 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:20.408 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:20.410 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:20.410 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:24.278 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:24.280 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:24.280 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:24.282 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:24.282 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:31.147 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:31.147 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:31.147 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:31.148 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:31.149 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:21:44.396 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:21:44.396 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:21:44.396 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:21:44.397 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:21:44.397 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:10.536 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:10.536 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:10.536 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:10.537 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:10.537 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:15.745 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:15.745 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:15.745 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:15.746 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:15.746 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:16.346 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:16.346 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:16.346 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:16.347 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:16.348 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:17.051 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:17.051 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:17.051 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:17.052 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:17.052 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:18.324 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:18.324 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:18.324 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:18.332 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:18.332 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:19.800 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:19.801 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:19.801 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:19.802 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:19.802 # Can't open the append-only file: Permission denied
redis_1                      | 1:C 28 Feb 2023 20:22:22.029 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1                      | 1:C 28 Feb 2023 20:22:22.032 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1                      | 1:C 28 Feb 2023 20:22:22.032 # Configuration loaded
redis_1                      | 1:M 28 Feb 2023 20:22:22.034 * monotonic clock: POSIX clock_gettime
redis_1                      | 1:M 28 Feb 2023 20:22:22.034 # Can't open the append-only file: Permission denied
tutor_local_redis-permissions_1 exited with code 0
smtp_1                       | 2023-02-28 01:38:46 exim 4.95 daemon started: pid=7, -q15m, listening for SMTP on port 8025 (IPv6 and IPv4)
smtp_1                       | 2023-02-28 01:38:47 Start queue run: pid=8
smtp_1                       | 2023-02-28 01:38:47 End queue run: pid=8
smtp_1                       | 2023-02-28 01:53:47 Start queue run: pid=9
smtp_1                       | 2023-02-28 01:53:47 End queue run: pid=9
smtp_1                       | 2023-02-28 02:08:47 Start queue run: pid=10
smtp_1                       | 2023-02-28 02:08:47 End queue run: pid=10
smtp_1                       | 2023-02-28 02:20:41 exim 4.95 daemon started: pid=6, -q15m, listening for SMTP on port 8025 (IPv6 and IPv4)
smtp_1                       | 2023-02-28 02:20:42 Start queue run: pid=7
smtp_1                       | 2023-02-28 02:20:42 End queue run: pid=7
smtp_1                       | 2023-02-28 02:35:42 Start queue run: pid=8
smtp_1                       | 2023-02-28 02:35:42 End queue run: pid=8
smtp_1                       | 2023-02-28 02:50:42 Start queue run: pid=9
smtp_1                       | 2023-02-28 02:50:42 End queue run: pid=9
smtp_1                       | 2023-02-28 03:05:42 Start queue run: pid=10
smtp_1                       | 2023-02-28 03:05:42 End queue run: pid=10
smtp_1                       | 2023-02-28 19:36:14 exim 4.95 daemon started: pid=7, -q15m, listening for SMTP on port 8025 (IPv6 and IPv4)
smtp_1                       | 2023-02-28 19:36:15 Start queue run: pid=8
smtp_1                       | 2023-02-28 19:36:15 End queue run: pid=8
smtp_1                       | 2023-02-28 19:51:15 Start queue run: pid=9
smtp_1                       | 2023-02-28 19:51:15 End queue run: pid=9
smtp_1                       | 2023-02-28 20:06:15 Start queue run: pid=10
smtp_1                       | 2023-02-28 20:06:15 End queue run: pid=10
smtp_1                       | 2023-02-28 20:21:15 Start queue run: pid=11
smtp_1                       | 2023-02-28 20:21:15 End queue run: pid=11
tutor_local_redis_1 exited with code 1
cms-worker_1                 |  
cms-worker_1                 |  -------------- celery@edx.cms.core.default.%6d7c90097c7e v5.2.6 (dawn-chorus)
cms-worker_1                 | --- ***** ----- 
cms-worker_1                 | -- ******* ---- Linux-5.15.0-1030-aws-x86_64-with-glibc2.29 2023-02-28 20:22:29
cms-worker_1                 | - *** --- * --- 
cms-worker_1                 | - ** ---------- [config]
cms-worker_1                 | - ** ---------- .> app:         proj:0x7f163fb11d00
cms-worker_1                 | - ** ---------- .> transport:   redis://redis:6379/0
cms-worker_1                 | - ** ---------- .> results:     
cms-worker_1                 | - *** --- * --- .> concurrency: 2 (prefork)
cms-worker_1                 | -- ******* ---- .> task events: ON
cms-worker_1                 | --- ***** ----- 
cms-worker_1                 |  -------------- [queues]
cms-worker_1                 |                 .> edx.cms.core.default exchange=edx.cms.core(direct) key=edx.cms.core.default
cms-worker_1                 |                 .> edx.cms.core.high exchange=edx.cms.core(direct) key=edx.cms.core.high
cms-worker_1                 |                 .> edx.cms.core.low exchange=edx.cms.core(direct) key=edx.cms.core.low
cms-worker_1                 | 
cms-worker_1                 | [tasks]
cms-worker_1                 |   . celery_utils.tasks.mark_resolved
cms-worker_1                 |   . cms.djangoapps.cms_user_tasks.tasks.send_task_complete_email
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.export_olx
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.import_olx
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.rerun_course
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_all_outlines_from_modulestore_task
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_library_index
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_outline_from_modulestore_task
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_search_index
cms-worker_1                 |   . cms.djangoapps.contentstore.tasks.update_special_exams_and_publish
cms-worker_1                 |   . cms.djangoapps.coursegraph.tasks.dump_course_to_neo4j
cms-worker_1                 |   . cms.djangoapps.export_course_metadata.tasks.export_course_metadata_task
cms-worker_1                 |   . common.djangoapps.entitlements.tasks.expire_old_entitlements
cms-worker_1                 |   . edx_name_affirmation.tasks.delete_verified_name_task
cms-worker_1                 |   . edx_name_affirmation.tasks.idv_update_verified_name_task
cms-worker_1                 |   . edx_name_affirmation.tasks.proctoring_update_verified_name_task
cms-worker_1                 |   . edx_sga.tasks.zip_student_submissions
cms-worker_1                 |   . enterprise.tasks.create_enterprise_enrollment
cms-worker_1                 |   . enterprise.tasks.send_enterprise_email_notification
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.cleanup_duplicate_assignment_records
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_content_metadata
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_single_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_single_subsection_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.transmit_subsection_learner_data
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.unlink_inactive_learners
cms-worker_1                 |   . integrated_channels.integrated_channel.tasks.update_content_transmission_catalog
cms-worker_1                 |   . lms.djangoapps.certificates.tasks.generate_certificate
cms-worker_1                 |   . lms.djangoapps.discussion.tasks.send_ace_message
cms-worker_1                 |   . lms.djangoapps.discussion.tasks.update_discussions_map
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_all_grades_for_course
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_grades_for_course
cms-worker_1                 |   . lms.djangoapps.grades.tasks.compute_grades_for_course_v2
cms-worker_1                 |   . lms.djangoapps.grades.tasks.recalculate_course_and_subsection_grades_for_user
cms-worker_1                 |   . lms.djangoapps.grades.tasks.recalculate_subsection_grade_v3
cms-worker_1                 |   . lms.djangoapps.instructor.tasks.update_exam_completion_task
cms-worker_1                 |   . lms.djangoapps.verify_student.tasks.send_request_to_ss_for_user
cms-worker_1                 |   . lms.djangoapps.verify_student.tasks.send_verification_status_email
cms-worker_1                 |   . openedx.core.djangoapps.bookmarks.tasks.update_xblocks_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.get_course_in_cache_v2
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache
cms-worker_1                 |   . openedx.core.djangoapps.content.block_structure.tasks.update_course_in_cache_v2
cms-worker_1                 |   . openedx.core.djangoapps.content.course_overviews.tasks.async_course_overview_update
cms-worker_1                 |   . openedx.core.djangoapps.content_libraries.tasks.import_blocks_from_course
cms-worker_1                 |   . openedx.core.djangoapps.course_apps.tasks.cache_all_course_apps_status
cms-worker_1                 |   . openedx.core.djangoapps.course_apps.tasks.update_course_apps_status
cms-worker_1                 |   . openedx.core.djangoapps.credit.tasks.update_credit_course_requirements
cms-worker_1                 |   . openedx.core.djangoapps.discussions.tasks.update_discussions_settings_from_course_task
cms-worker_1                 |   . openedx.core.djangoapps.heartbeat.tasks.sample_task
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.award_course_certificate
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.award_program_certificates
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.revoke_program_certificates
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.update_certificate_visible_date_on_course_update
cms-worker_1                 |   . openedx.core.djangoapps.programs.tasks.update_credentials_course_certificate_configuration_available_date
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseNextSectionUpdate
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleCourseUpdate
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleRecurringNudge
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.ScheduleUpgradeReminder
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._course_update_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._recurring_nudge_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks._upgrade_reminder_schedule_send
cms-worker_1                 |   . openedx.core.djangoapps.schedules.tasks.update_course_schedules
cms-worker_1                 |   . openedx.core.djangoapps.service_status.tasks.delayed_ping
cms-worker_1                 |   . openedx.core.djangoapps.user_authn.tasks.check_pwned_password_and_send_track_event
cms-worker_1                 |   . openedx.core.djangoapps.user_authn.tasks.send_activation_email
cms-worker_1                 |   . openedx.core.djangoapps.verified_track_content.tasks.sync_cohort_with_mode
cms-worker_1                 |   . super_csv.mixins.do_deferred_commit
cms-worker_1                 |   . user_tasks.tasks.purge_old_user_tasks
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:22:29,551: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 2.00 seconds... (1/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:22:31,557: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 4.00 seconds... (2/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:22:35,565: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 6.00 seconds... (3/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:22:41,575: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 8.00 seconds... (4/100)
cms-worker_1                 | 
tutor_local_redis_1 exited with code 1
cms-worker_1                 | [2023-02-28 20:22:49,585: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 10.00 seconds... (5/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:22:59,599: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 12.00 seconds... (6/100)
cms-worker_1                 | 
tutor_local_redis_1 exited with code 1
cms-worker_1                 | [2023-02-28 20:23:14,683: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error 113 connecting to redis:6379. No route to host..
cms-worker_1                 | Trying again in 14.00 seconds... (7/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:23:28,693: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 16.00 seconds... (8/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:23:44,710: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 18.00 seconds... (9/100)
cms-worker_1                 | 
cms-worker_1                 | [2023-02-28 20:24:02,733: ERROR/MainProcess] consumer: Cannot connect to redis://redis:6379/0: Error -3 connecting to redis:6379. Temporary failure in name resolution..
cms-worker_1                 | Trying again in 20.00 seconds... (10/100)

However, once I did a quickstart (tutor 14.0.3, Nutmeg), all the broken functionality was working again!

Regis, can you please explain the implications of moving the redis data out of the way, and whether it’s safe to delete, or whether I should keep it around? (I.e. if I delete it am I going to lose some important logs that I might want to search through later for information or statistics that doesn’t exist in either mysql or mongodb?)

Also now that I have snapshots of a system in this state, please let me know if there’s any other tests you’d like me to do to help root-cause the issue, before I move on? E.g. should I put it back in place and confirm the system goes back to broken?

Is it possible you weren’t able to recreate it because Redis is an in-memory db/cache, and you didn’t make a file that was big enough to exhaust your RAM? (only 10^7 * 256 bytes ~= 2.5GB) My server has 8GB RAM, and my file was 7.3GB, so if redis was trying to pull it all into memory then it was presumably exhausting all memory and causing swapping and other issues…

@MaartenGH how much RAM did your server have where you encountered it? 4GB? (Since your file was 3.6GB so maybe it was also causing RAM exhaustion.)

Thank you

1 Like

This data should be safe to delete. The most critical data in there are the asynchronous celery tasks, such as sending emails, exporting courses, etc. If you delete this data, then the tasks that were ordered but not yet triggered will never complete.

The permission issue stems from the fact that you deleted the redis data folder. It was re-created by docker-compose with root ownership. tutor local quickstart takes care of fixing the permissions.

I’m wondering if the issue is caused by the fact that you have such a large appendonly.aof file. If you have expertise with Redis, you can search for the configuration settings that will trigger a rewrite at the right time.

I had literally never even heard of Redis before this problem :slight_smile: So no, I don’t think I’d be able to find how to check things. I’m just offering to run any commands you request on a system in a known-broken state and provide the log output. If you’re not interested, that’s fine, but LMK so I can delete the Amazon snapshots and save some money.

OK. Thanks for offering your help.

  1. Please restore your old data.
  2. Start your redis cluster with tutor local start -d redis.
  3. Wait until redis is up: you should be able to see its status update by running tutor local logs --tail=100 -f redis.
  4. Once redis is up, run the following command: tutor local exec redis redis-cli BGREWRITEAOF. It should take some time to run. After it completes, what is the size of the appendonly.aof file?

832MB

FWIW the last command didn’t take a long time, it completed almost immediately.

ls -lah $(tutor config printroot)/data/redis
total 8.9G
drwxr-xr-x  2 ubuntu ubuntu 4.0K Mar  9 19:12 .
drwxr-xr-x 11 ubuntu ubuntu 4.0K Jul 18  2022 ..
-rw-r--r--  1 ubuntu ubuntu 7.3G Mar  9 19:12 appendonly.aof
-rw-r--r--  1 ubuntu ubuntu 832M Mar  9 19:12 dump.rdb
-rw-r--r--  1 ubuntu ubuntu 740M Mar  3 17:06 temp-1.rdb
ubuntu@ip-172-31-8-105:~$ tutor local start -d redis
docker-compose -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/dev/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/dev/docker-compose.tmp.yml --project-name tutor_dev stop
docker-compose -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local up --remove-orphans --build -d redis
Starting tutor_local_redis-permissions_1 ... done
tutor_local_redis_1 is up-to-date
ubuntu@ip-172-31-8-105:~$ tutor local exec redis redis-cli BGREWRITEAOF
docker-compose -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.prod.yml -f /home/ubuntu/.local/share/tutor/env/local/docker-compose.tmp.yml --project-name tutor_local exec redis redis-cli BGREWRITEAOF
Background append only file rewriting started
ubuntu@ip-172-31-8-105:~$ ls -lah $(tutor config printroot)/data/redis
total 2.4G
drwxr-xr-x  2 ubuntu ubuntu 4.0K Mar  9 19:13 .
drwxr-xr-x 11 ubuntu ubuntu 4.0K Jul 18  2022 ..
-rw-r--r--  1 ubuntu ubuntu 832M Mar  9 19:15 appendonly.aof
-rw-r--r--  1 ubuntu ubuntu 832M Mar  9 19:12 dump.rdb
-rw-r--r--  1 ubuntu ubuntu 740M Mar  3 17:06 temp-1.rdb

In and of itself, this did not seem to fix the inability to export though. However “tutor local launch” did (I apparently already deleted all the nutmeg snapshots, so I only had a still-broken Olive-upgrade-from-Nutmeg VM still available.)

I believe that the command above resolved the problem, as you now have an appendonly.aof file which of just 832MB. This means that Redis should no longer have any problem to start.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.