Tech talk/demo: Deploying multiple Open edX instances onto a Kubernetes Cluster with Tutor

braden · March 31, 2021, 6:22pm

At OpenCraft, I’ve been doing some exploration about how we might use Tutor for our Open edX hosting services, which can involve us managing hundreds of separate Open edX deployments. (We currently use our in-house Ocim software for managing most of our instances, which works well but is not docker-based.)

My exploration has resulted in a proof of concept implementation, and I’ve put together a video (15 min) showing what I’ve built so far, which lets you use Tutor + Terraform + GitLab CI to manage a Kubernetes cluster that can have many different Open edX instances running on it:

The code is available at on GitLab: opencraft / dev / Grove · GitLab

The OpenCraft team has already been discussing this quite a bit, and you can read that and join the discussion on this thread: Tech talk/demo: Deploying multiple Open edX instances onto a Kubernetes Cluster with Tutor - Announcements - Public - OpenCraft

I’d love to hear from @regis about this approach, as well as any others who are interested in a similar use case.

regis · April 1, 2021, 10:50am

This is a great presentation! thank you Braden and Opencraft for putting this together. Here are my notes, both from the video, the Opencraft forum post and the notes in the gitlab repo:

Yes, tutor prints an obnoxious warning when you run it as root. If possible, I’d like to keep this warning, because users very frequently make the mistake of running tutor both as root and as a different user, resulting in different env folders, but identical containers, leading to misunderstanding, despair, and mayhem. I’m only slightly exaggerating here. Seriously, it’s a big issue that pops up every now and then on the forums. Also, I’d like to suggest that you do not run tutor as root, even inside containers: this will make it easier for you in the future to transition to rootless containers (which many people want and need).
You mentioned that you would like to define tutor settings as environment variables: this is already possible by defining env vars prefixed by TUTOR_: Configuration and customisation — Tutor documentation – But now that I’m reading your wrapper-script I realize that you already know that.
When running tutor k8s quickstart, you get the interactive prompts for setting a few platform parameters (url, language, etc.). If for some reason you don’t need this part you can simply run instead tutor k8s start && tutor k8s init.
I stronly suggest not creating different kubernetes clusters for every platform. This will make maintenance very difficult in the future. Instead, each platform should run in a different namespace, defined by K8S_NAMESPACE. This should greatly simplify the terraform scripts.
In my experience, a single 8Gb DigitalOcean node should be enough to run a full Open edX platform – although this is obviously not recommended in production.

Again, great job!

PS: On an unrelated note: please avoid “Tutor-something” project names, as Tutor is now a registered trademark. “Tutorraform” is perfectly OK if it’s only used internally, but not if it becomes widely used.

braden · April 1, 2021, 5:38pm

Thanks for checking it out, @regis ! And I have to say a big thank you to you - Tutor is really impressive, and it worked flawlessly (and on the first try) for everything I asked it to do

No worries, it’s up to you. I was just hoping that we could make the warning more subtle when Tutor is run inside a docker container; not get rid of it completely.

I can disable the interactive prompts with --non-interactive. But Tutor still rewrites the config file, which is what I’m trying to avoid. From what I can tell, neither start nor init re-generate the env, but maybe I missed something. What I really want for this use case is a command that does this part and can be used after I manually change the config file. That is, it should: re-generate the env, start/restart the platform, and run migrations/upgrades, but not rewrite the config file.

Isn’t that what I’m doing? This project is focused on exactly that, running multiple Open edX instances on the same Kubernetes cluster, and puts each one into a different k8s namespace.

Ah, ok. Well “Tutorraform” is a terrible name anyways, so if we develop this beyond this initial proof of concept, I don’t mind changing it. Open to ideas, if anyone reading this wants to suggest something

What about unofficial tutor plugins - can we still call them something-tutor-plugin ?

regis · April 5, 2021, 10:02am

Right. This use case has not occurred, yet – in part because it’s a bit risky. If some of the generated variables (such as passwords) are not correctly set, they will be re-generated later with different values – and lost.

We can certainly add a -C/--skip-config-file option to the config save command to address your use case.

Right! Sorry about that, I did not take the time to read in details the scripts and instructions before replying.

Good question: I’d rather avoid “tutor*” names for non-official plugins. However, currently the plugin cookiecutter automatically generates tutor-* packages. I guess I should change that to, for instance, “tutor-contrib-*” packages. I’m open to ideas, too.

braden · April 6, 2021, 8:20pm

tutor-contrib seems fine to me if you aren’t willing to accept ___-tutor-plugin or ____-for-tutor.

imantaba · May 31, 2021, 5:05am

there is not any -C/--skip-config-file option for tutor config save
tutor, version 11.2.11

regis · May 31, 2021, 6:32am

This is correct. This option was never added because no one worked on it. The change is simple enough to implement, but I wouldn’t be sure how to name the option. “–env-only”? “–skip-config”? “–no-config”?
What’s your use case @imantaba? Do you also need this option? Why?

imantaba · May 31, 2021, 7:15am

Hi @regis ,
Thank you for fast reply. With this change some of our issues will be resolved . For example i want to change this part of config , but i can’t change it with my custom plugin. I think --skip-config is better option.

regis · May 31, 2021, 7:36am

@imantaba The DATA_DIR part of the configuration has nothing to do with the --skip-config option. I suggest you create a dedicated Tutor plugin that will add content to the settings file. Let’s not derail this thread further: for further Tutor support, you should head to the Tutor forums.

alfred · July 30, 2021, 3:24pm

Hi everyone, I wanted to let you know that Tutorraform has been renamed to Grove, you can find the project here opencraft / dev / Grove · GitLab

lpm0073 · April 13, 2022, 4:51pm

Hello all, i’ve been doing this for a while, on a couple of platforms using this code base or the code from which this was derived: https://github.com/lpm0073/cookiecutter-openedx-devops.

happy to collaborate if you guys are still focused on kubernetes.

braden · April 13, 2022, 9:43pm

@lpm0073 Very nice work! We would be happy to collaborate.

We have been continually developing our Tutor+Kubernetes management tool called Grove, although we are just starting to move it into production this month so we don’t have as much production experience yet. Our use case is a bit different as it’s focused on managing multiple complete Open edX instances (1-100) on one or more Kubernetes clusters, and we currently support either AWS or DigitalOcean, with the possibility to add more providers in the future.

CC @gabor who is leading the Grove project now.

lpm0073 · April 14, 2022, 1:37pm

outstanding. My objective is to evolve a community-supported code base that provides a Tutor-like 1-click experience for creating and maintaining a horizontally scaled back end. I’m fine with a solution belonging to someone like OpenCraft as long as the long-term vision is for there to always be a free community-supported product alongside however you guys make money. Adding even more providers – Azure, GCC – seems like a good way to attract more adoption and more contributors.

My current areas of focus on my own code base are a.) getting Fargate to work with all of Tutor’s pods, and b.) replacing the AWS Classic Load Balancer with an ALB. I had an encouraging work session last week with solutions guys at AWS that makes me cautiously optimistic that i might get both of these to work in the coming months.

my current code base handles multiple complete open edx installations btw, but not entirely to my satisfaction. it creates separate instances of data services for mysql, mongo, and redis, which is good. but i’d like to be able to combine dev/test/prod into the same instances on a per-installation basis in order to lower the cloud provider costs. using my code, the resulting AWS bill for one environment of one installation costs around $300 USD monthly, but it could possibly be lowered. my clients who manage multiple environments for test / stage / prod all spend at least $1,000 per month for baseline service, which seems excessive.

This is a priority for me this year, and i can bring some limited financial support via existing clients who would benefit from broader adoption and support of a code base that already supports their platforms.

next steps?

braden · April 18, 2022, 9:49am

Our model is pretty simple: the software is open source and we want to build a community around it. There’s no paid portion of the software. We make money when people want us to manage the hosting for them completely, but anyone who wants to do it themselves or use it for a similar business can do so for free, provided they contribute any improvements back to the project too.

For Grove, we are using a concept of “cluster” and “instances”, where each cluster is a Kubernetes cluster with one shared MySQL DB, one shared MongoDB, one shared Redis instance, and then one or more Open edX instances. So if you have the requirement for more isolation of a particular instance, you just deploy it on its own cluster and then all its resources are exclusive. For smaller sites, dev/staging, etc., you just deploy them on the same cluster, and costs will be much lower. And the codebase is relatively simple because the same approach is used for both modes.

That’s the main reason we are supporting DigitalOcean, as it’s much more affordable.

Adding @gabor here as he is working on the Grove roadmap and can probably suggest better next steps than me. Also @lpm0073 will you be at the conference? If so, we can chat in person next week too.

BTW I haven’t tried Fargate and would be curious to hear how it works for you.

Andres.Aulasneo · April 18, 2022, 10:47am

Hi @lpm0073 and @braden!
We are working also in a Kubernets infrastructure for our Open edX service. We are focusing on AWS for the time being.
In our experience, Fargate is not good enough for a world class service. Actually, working with EC2 instances allows you more control, including the possibility to reduce costs by purchasing reserved instances.
In our investigation, we found much better to have only caddy, lms, cms, forum and MFE in K8s. For all the rest: MySQL, MongoDB, Redis and OpenSearch, we use AWS services. MongoDB deserves a special discussion, as DynamoDB is not fully compatible and Atlas is way too expensive.
There is also a small pitfall in the caddy pod, which is statefull and brings some problems in terms of resiliency. I have a pending task to create a PR to work this around.
I was also interested in changing the standard ELB by ALB and adding a WAF, but didn’t try yet.
Are you guys going to Lisbon? If so we can have a meeting there. Anyway, I’d be more than happy to have a talk and share experiences.

braden · April 18, 2022, 11:35am

Yep, I will be at the conference all week next week. Let’s definitely meet up.

lpm0073 · April 18, 2022, 1:49pm

i have a schedule conflict but can communicate with you all remotely.

lpm0073 · April 18, 2022, 1:58pm

hi @Andres.Aulasneo! that’s great feedback and i’m excited that you’ve been focused on the same topics. lets definitely talk more. re your comments:

Fargate: i was mainly interested in using Fargate for highly volatile, bursty services like the lms worker but i’ve been pretty disappointed in the lag time for it to scale up. but the service might evolve in the future. meanwhile i’m going to refocus elsewhere.

back end services: for my clients’ production sites i use mongodb installed directly on an EC2 instance which has always worked fine. i’ll probably regress to this in my Cookiecutter template.

gabor · April 18, 2022, 9:42pm

Thanks for looping me in!

As of the next steps, as @braden mentioned, I’m working on a roadmap for Grove. For the next two weeks we will focus on missing features, providing more documentation and a clearer communication about the project and its current state. After the conference, when everyone at its full capacity, we will focus on monitoring, observability, worker scaling, and more.

We have some discoveries starting tomorrow, that’s the reason I wasn’t more specific, since the outcome may changes orders or adding new items on the roadmap.

Unfortunately, I wasn’t able to find an option yet, but I’ll try to make the roadmap somehow publicly available in the upcoming days so it would be more visible, if you agree with me too, @braden .

Unfortunately, I cannot participate. Could you please somehow share the info with me after the meeting?

Actually that can be combined with spot.io which can help reduce costs further, though I think it will be still pricier than DigitalOcean – though that’s a personal opinion and highly depends on usage.

Although I never used Fragate, I would love to learn what issues you had with it.

lpm0073 · April 19, 2022, 2:55pm

Fargate.
Good: it looks and behaves like a normal Linux instance, understanding that each instance is a single virtual core. there is nothing to administer nor manage. you instantiate it, use it, terminate it – all on automatic pilot.

Bad:
a.) Cost - as @Andres.Aulasneo mentioned, it’s objectively the most expensive compute option that AWS offers, so you need a use case that leverages what Fargate offers while avoiding it entirely in all other cases. otherwise, you’ll be very, very sad when your AWS monthly invoice arrives.

b.) Latency - it takes a while for a Fargate instance to instantiate. To be fair, the latency is not any different than that of a typical EC2 instance. But, you only get 1 virtual cpu with each Fargate instance and so you’re prone to suffering painful latency periods while these serially spin up during a burst of user activity. There are use cases like with AWS Lambda where Fargate is a good fit, but these are edge cases.

Topic		Replies	Views
Deploying Open edX on Kubernetes Using Helm Collaborative Proposals k8s	55	4751	July 13, 2023
Let's talk about the native installation Build-Test-Release	17	4806	February 5, 2021
Deploy an instance of openedx without tutor Tutor Help devops	5	238	March 19, 2025
CI with Tutor and Github Actions News tutor , maple	2	496	January 5, 2024
Announcing a new CookieCutter for deploying Tutor to Kubernetes at scale Site Operators devops , tutor , maple	5	1238	April 13, 2022

Tech talk/demo: Deploying multiple Open edX instances onto a Kubernetes Cluster with Tutor

Related topics