Here are my notes on the meeting:
- @Felipe’s summary of my position on the Help OEP is accurate: I’d love to make it easier to integrate Tutor & Helm, but I think that we need to do it in a way that is compatible with the existing plugin ecosystem.
- I’m excited to hear from @braden’s implementation of a load balancer with Helm that plays well with Tutor and the plugin ecosystem.
- Issues with the current k8s implementation in Tutor:
There’s no way around Caddy
I do not understand this specific point. Caddy serves two purposes in a Tutor-based k8s deployment. It’s:
- an ingress, with SSL certificates and all.
- a web proxy that handles http requests.
Caddy does not do load balancing. Kubernetes is in charge of that. All Caddy does is: “I’m getting this http request, let’s process it a little, then forward it to http://lms:8000 (for instance)”. Kubernetes (or docker-compose) is then in charge of load-balancing between services named “lms”. I think that Caddy is doing a great job at being a web proxy (role 2). Caddy as an ingress can be disabled by (unintuitively) setting ENABLE_WEB_PROXY=false
. So I’m not sure why we are pointing fingers at Caddy here.
Volume persistence is inconvenient, in particular across availability zones
Is this issue related to this other one? Improve storage model of Caddy's pods in K8s · Issue #737 · overhangio/tutor · GitHub Please comment there. I must admit that in this matter I’m limited by my own understanding of k8s volumes. Also, it’s difficult to propose a default implementation that works across k8s providers, who all offer different persistent volume types.
Again, as on many other matters, I’m really open to comments. Would Helm help resolving this issue? In my understanding it wouldn’t, but I might be wrong.
“Managing plugins is tricky”
This topic is close to my heart
If I understand correctly, the argument that was being made here was that it’s more work for end users to make changes to their Open edX platform by creating a plugin than having the feature baked in upstream in Tutor core.
I have so many things to say on this topic, but I’ll try to keep it brief.
Adding new feature to Tutor core has heavy consequences for everyone: both Tutor maintainers and Tutor users. Let’s assume that Tutor maintainers have plenty of free time and energy and they are able to competently maintain all features that are added to Tutor core. Let’s focus on Tutor users. Adding more features to Tutor core should not make it harder for these users to maintain their own platforms. But in general, adding a feature does make it more complex.
Let’s take the example of this pull request, which was eventually closed without merging feat: k8s horizontal pod autoscaling by gabor-boros · Pull Request #677 · overhangio/tutor · GitHub It’s a shame that we’ve lost the original commits from this PR; originally, it introduced auto-scaling to the Open edX platform on Kubernetes. I absolutely love this feature. But it required 32 new Tutor settings to be created, just for Open edX core (without ecommerce, forum, etc.). This is just too much added complexity that all Tutor end users would eventually have to deal with. So I recommended that Opencraft create a plugin that implement these changes. I also suggested that Opencraft maintains this plugin, as they clearly have the most expertise on the topic of Kubernetes auto-scaling.
My bottom line is this: addressing a few user’s use case should not make life more difficult for many others. If you have a very specific use case, then there’s a good chance that you are not an individual but a company, and one with a dedicated team of engineers working on Open edX. It’s only fair that you put in the work, create a plugin and maintain it. This “philosophy” is the origin of many design decisions that have happened inside Tutor. In particular: the recent switch to the more powerful plugin V1 API, the extensive documentation of the plugin creation process, the future creation of third-party plugin indices, etc.
“The Tutor CLI is not inconvenient; in particular for jobs”
I think that the CLI is okay (and it will be further improved in Olive) but I agree that the current implementation of k8s (and docker-compose) jobs is clunky.
Basically, the K8sJobRunner
manually patches job specs to run ad-hoc tasks in the right containers. For instance: initialisation tasks, such as database migrations, etc.
I have tried hard to improve the implementation of K8s jobs, but could not find a better implementation. In particular, this was the only way I found to avoid the duplication of job declarations for every single task. I would love to have a better way to handle jobs.
The openedx container is a “gigantic monolith”
This is an actual issue, and an important one, but I I do not think that it’s related to using Helm/Kubernetes or not. Still, a few words about this…
I must admit that I cringe a little when I hear that the openedx Docker image is “not optimized for building”… I rebuild the openedx Docker image many times a day, and I really need that process to be efficient, so I’ve dedicated many hours to making this process as smooth and fast as possible.
For the record, the current openedx image is already layer-based. Those layers were designed thoughtfully and if small changes trigger cache invalidation, there’s almost certainly a good reason for that. If a user believes that their changes to the Dockerfile should not trigger cache invalidation, they should implement the “openedx-dockerfile-final” patch in their plugin.
Just the python virtualenv and the node_modules
take up 1.2GB of disk space, so I find it unlikely that we are ever able to generate sub-1GB uncompressed images (note that the vanilla compressed images are already sub 1GB). I’m not saying it’s impossible, but I do not know how to improve this situation, and I’m very much open to contributions.
On that topic, I’m afraid that any further non-trivial improvements will require major upstream changes in edx-platform (but I would love to be proved wrong).
Helm as a default tool
When I first started working on deploying Open edX to Kubernetes, I seriously considered Helm. One of the reasons that I chose kubectl apply
over Helm is that, in my understanding, k8s manifests could be used by Helm, but not the other way around. What I mean to say is that Helm does not have to replace the current manifest-based k8s implementation, but it can complement it.
Thus, this conversation should probably not revolve around “let’s replace tutor k8s
by something else with Helm”, but instead “can we use tutor k8s
to work with Helm”?