[K8S] RecursionError when initializing with MinIO

Hello everyone,

I am configuring a high availability Open edX installation using Tutor. I am running on AWS with:

  • Main installation at an EKS cluster.
  • MySQL using AuroraDB-MySQL
  • MongoDB using AWS
  • Redis using AWS Elastic Cache cluster
  • SMTP using AWS SES
  • MeiliSearch and MinIO using the provided ones by Tutor.
    • Their storage providers are AWS EFS setup at the cluster using their CSI controller.

When I tried to init the platform, the LMS init job that was applying the migrations run into the following RecursionError when applying a specific migration:

 Traceback (most recent call last):
  File "/openedx/edx-platform/lms/djangoapps/certificates/migrations/0003_data__default_modes.py", line 20, in forwards
    conf.icon.save(
  File "/openedx/venv/lib/python3.11/site-packages/django/db/models/fields/files.py", line 93, in save
    self.name = self.storage.save(name, content, max_length=self.field.max_length)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
< lines have been omitted >
  File "/openedx/venv/lib/python3.11/site-packages/botocore/awsrequest.py", line 614, in __init__
    self.update(*args, **kwargs)
  File "<frozen _collections_abc>", line 947, in update
  File "<frozen abc>", line 119, in __instancecheck__
RecursionError: maximum recursion depth exceeded in comparison

This specific migration causes an infinite loop which results in the script crashing. I looked into it further and it is related to S3 storage, which in this case is handled by MinIO.

I have tried the following with no luck in fixing this:

  • Clean install Tutor. Purge all config and reconfigure again.
  • Delete the MinIO persistent volume and let it recreate it/re-initialize it.
  • Disabling and setting up the plugin again.

I have also checked multiple times my configuration but it should not be something caused by it, mainly because I did not change the Tutor-provided init values.

I am attaching a log file of the job in case it can be helpful. Any ideas on what could cause this or how to fix it would be greatly appreciated.

log.txt (9.6 KB)

I did some more digging and found out this resolved issue: Minio breaks migrations with fresh install on teak · Issue #61 · overhangio/tutor-minio · GitHub

Could these two be related? cc: @Danyal_Faheem

Hi @Retr0, thank you for bringing this to our attention.

Looking at the 2 issues, they indeed do appear on the same migration step but I believe that is because that is the first time in the setup process that S3 is used (to upload files). The issue mentioned in Minio breaks migrations with fresh install on teak · Issue #61 · overhangio/tutor-minio · GitHub occurred due to a boto3 version upgrade which was made in the edx-platform master branch and a release with that branch is yet to be cut.

From the very helpful logs provided, I can see that boto3 tries to retry the put request and look for the bucket region. Can you verify that the bucket specified is accessible by openedx?

Can I also request you to mention the tutor and tutor-minio version being used so that we can try to look into this further?

Hey @Danyal_Faheem thanks for the quick response!

The bucket specified should be accessible considering MinIO is running using the default options provided by the plugin.

I can also verify that the main bucket (“openedx”) exists through the console panel. One thing I noticed and I believe is worth mentioning is that the files.{{ LMS_HOST }} application is extremely slow and eventually times out when tried to visit from a browser. All other apps of Tutor (e.g. meilisearch.{{ LMS_HOST }} or minio.{{ LMS_HOST }) are working normally so I don’t have a reason to believe it’s a DNS issue.

tutor version: tutor, version 19.0.4
tutor-minio version: 19.0.1

I have resolved the issue! For anyone else that might stumble upon this in the future…

Read carefully the Configuration and customisation — Tutor documentation section for HTTPS access. In my setup HTTPS termination was done by the AWS Application Load Balancer. This means that visiting a “http” link will redirect you to its “https” counterpart with no issues so with that I mind I set ENABLE_HTTPS=false and ENABLE_WEB_PROXY=false.

But this doesn’t apply to any applications. Applications (like curl which helped me realize the issue after a lot of testing) will use “http” links if ENABLE_HTTPS=false which means that they will encounter a 301 Moved Permanently error. This causes an infinite redirection loop in this specific case of boto3.

So all I had to do is set ENABLE_HTTPS=true so they use the “https” links directly and dodge the 301 redirection loop.

1 Like