Let's talk about the native installation

The Open edX native installation is currently the only installation method that is officially supported by edX: see docs here and in Jira.

As everyone here knows, the native installation suffers from quite a few major issues. I’ll just name two here:

  1. A very high level of complexity, which makes it very hard to contribute to the edx/configuration repository.
  2. No upgrade method for existing releases.

EdX acknowledges these issues and has taken steps to address them. It was decided in OEP-45 (also see PR) to move to a Docker-based approach for installing Open edX :whale:.

I think that the community needs to contribute to this effort: whichever method edX decides to use to deploy Open edX, it is very likely that the rest of the community will not want to use exactly the same installation method. Also, I don’t want to speak for edX, but it seems to me that they will not be interested in maintaining a repository which they do not use.

Thus, I think that the Open edX community should step forward and propose an implementation of a Docker-based production installation for Open edX.

In case it wasn’t clear, we are looking for volunteers here :slight_smile: :raising_hand_woman: :raising_hand_man:

Now of course I have proposed a Docker-based installation method for Open edX for a few year now, and that is Tutor. It’s robust, documented, extensible, scalable and supports one-click upgrades. If I had to do it all over again, I wouldn’t do anything differently. So I’m not going to step forward and propose a new Docker-based deployment method. But we need to acknowledge that this implementation exists and could be proposed as the new native installation for Open edX. I understand that Tutor as its limitations: for instance, it does not support deployment of the Open edX master branch, yet – but this is being worked on in the edge branch of the tutor repo. Also, Tutor does not have a clear transition path from the current native installation – although it’s been done before.

Work by edX has already started to propose a Docker-based installation method. For instance, the edx-platform repo now includes a production-ready Dockerfile. The Open edX community may want to wait until edX releases their own deployment implementation, as it was agreed here, but the community should step in early and discuss what is the best deployment strategy for them.

What do you think? Are you interested in contributing to this effort? Are there arguments against Tutor being the officially supported native installation? I’d like that the BTR working group has this conversation to keep the topic open and move towards a solution.

8 Likes

Hey,

We have been deploying Open edX to production using Docker for quite some time using tutor (The early releases from a year ago) as a base for the images (Using github actions for CI [1]), and we would love better integration with the base installation, and contribute what we can.

The most difficult part of a docker installation is distributing the static/configuration files and the CodeJail, which require more attention to deploy securely. Right now edx/configuration is hard to deploy in multiple machines, and standardizing the storage to use S3/S3 compatible software would help a lot in distributing the application to more servers and offload the LMS (And prevent the pain of migrating from FileStorage to S3 when needed).

We just migrated our storage to use Minio (s3boto3) as a S3 provider last week (We are running bare metal, and not using tutor for this right now), as we are moving the applications to kubernetes, and we can’t use FileStorage in a distributed setup (We don’t want to use NFS for file storage, as S3 is used by default by edX), but moving to S3 helps a lot in a distributed scenario (reports and exports are now signed and not open to the world for example). We are also moving the static files to Minio (Already on staging), so we can decouple the static storage, and fix some problems we have when deploying new versions (The hashes change on deployment, and we need both versions of the static files to be available, which we can do by copying the new static files to the s3 bucket, deploy the new version, and then sync/delete the old files).

Doing everything with tutor would help a lot in preventing some of the problems we face right now, and it will help to have working/reproducible builds that we can rebuild on demand to test if old releases are working, while also having a backup of working images of them.

Best,
Felipe.

[1] https://github.com/eol-uchile/eol-edx

3 Likes

Hi Régis,
I fully agree that Open edX should have a renewed setup for production environment. I believe docker + kubernetes are state-of-the-art and suitable now for production, so for sure they should be considered. I am not an expert in these technologies, but I can help in thinking the principles of an infrastructure architecture. The design should keep scalability in mind. Components like LMS, mysql and mongodb have VERY different behaviour in its infra (e.g. disk access, memory, cpu, persistence vs. stateless, etc.), so we should decouple them and design scalable infra for each. Let me know how can we collaborate to arrive to a recommendation.

1 Like

Hi @regis, thanks for bringing this up!

In response, we’ve discussed this internally at OpenCraft and decided to evaluate whether both using Tutor for our projects and pushing it as the de facto reference implementation (one is dependent on the other) make sense for us. We’ll begin playing around with it seriously over the next couple of weeks, after which we’ll probably have lots of questions and suggestions.

To start with, though, let me bring up a couple of things that are important for us, and we’d like to hear from you (and the community), what you think:

  • How’s master support coming along? How easy will it be to put it under CI/CD whenever a new edx-platform@master commit comes down the pipes (so we can fix it if the alert is sounded)? We deem this to be an essential first step before considering it as a (or the) reference implementation.
  • Do you think it is reasonable to expect Tutor to be able to use the same Dockerfiles that edX will (reportedly) provide and maintain as a result of OEP-45? In an ideal world, even if edX don’t make their own orchestration configuration public, it would be best for the community if everybody’s using similarly constructed container images.
  • Kubernetes is the only sane container orchestration engine that scales. Do you envision Tutor not only continuing to support it, but expanding support as the need arises?
  • Last but not least (for now), we’re concerned about governance: edX has recently come out with the Core Commiter program, of which you’re one of the first few (congrats, btw :). Would you be willing to consider something similar for Tutor?

Note that if we end up pushing forward as suggested, you’ll probably be getting plenty of contributions from us on all of the above areas. (Fixing master when it breaks, contributing a way to factor theming out into a volume, etc.) The idea being that more people start to do the same, hopefully. :slight_smile:

3 Likes

Thanks for your detailed comment and your enthusiasm Adolfo! It would certainly be great to count Opencraft among the Tutor users and contributors.

First, let me clarify something: a few people messaged me saying “I saw your post about Tutor becoming the new default community installation”. My message was unclear: although I would be thrilled to see Tutor become the default recommended installation method, I am not pushing for Tutor to replace the native installation. Instead, what I am saying is that a task force should already start working on a default, community installation of Open edX that implements OEP-45. This task force might end up concluding that Tutor is the right choice (and again, I think that would be great), but people need to start thinkin about this – because edX is not going to do the first move.

That being said, let me try to answer your questions.

I made a proof-of-concept of running the edx-platform master branch in the edge branch of the Tutor repo. In the future, I expect that developers who want to run run or test the Open edX master branches will run from this edge branch. I’m not sure yet whether we will provide images for this branch.

It’s not going to be trivial, but it should be easy enough. The current CI is based on a Gitlab instance and runs on Kubernetes. Both Gitlab and K8s are self-hosted, so it should be scalable and configurable at will. Tutor also uses the Travis CI but this is just for building the Tutor binaries, and I don’t expect to use it for other tasks.

I understand the point, and this would be possible, but I don’t expect this will be the most reasonable choice. The Docker images are the central pieces of the puzzle, so Tutor needs to have full control of how they are generated. For instance, I went to great lengths to generate static assets in the Docker images in a reliable and efficient way. The edx-platform Docker image simply runs paver update_assets which I found does not work well.

Another difference is how Tutor serves static assets: Tutor recently switched from Nginx to Whitenoise and it works great. (And yes, it scales well :slight_smile:) It’s those small differences that add up and make edX’s and Tutor’s Docker images quite different.

That being said, if both images become sufficiently well aligned, I don’t exclude the possibility that one will inherit from the other. It’s just that I don’t expect this will happen any time soon.

I’m not sure I agree with your statement (especially the “sane” part), but yes Tutor will continue to support the Kubernetes deployment target. I’m not very strong with K8s operations so K8s is not my priority at the moment. For instance, I did not give enough attention to this PR which improves K8s support but seems to break the docker-compose compatibility.

Currently, Tutor is a thin wrapper on top of basic K8s manifest files which can be deployed with kubectl apply. This makes it easy to extend and customize an Open edX platform for K8s administrators. For instance, I don’t think it makes sense to provide a one-click installation for Kubernetes. It’s reasonable to expect that K8s administrators know what they are doing and have the skills to perform slightly complex operations manually.

However, I think it would make sense to develop a Tutor operator for Kubernetes: a sort of CD operator that would automatically build, push and re-deploy images as changes are pushed to the various repos.

So to answer your question: yes, I’ll be happy to extend support for K8s but we need to talk about what we want to add :slight_smile:

Absolutely. While one person is enough to develop and upgrade Tutor and all its supported plugins, it becomes near impossible to do this just by myself when we need to keep the master branch running daily. So I would need help from the community to maintain the edge branches of Tutor and its plugins. However, I want to keep a veto right for all new features: a Tutor-deployed Open edX platform should remain lean and simple, so I’m uninterested in pull requests that add features to be used by a small minority of users. If someone wants to do things differently, the best approach is that they develop their own plugin. Of course, if something in the Tutor codebase prevents them from developing such a plugin (such as a missing {{ patch(...) }} statement) then we’ll try to fix that for them.

Also, we would discuss the short-term roadmap with the other maintainers. For instance, here’s what I have in mind as the next major evolutions:

  • Replace Nginx by Caddy: this would make it much more simple to handle SSL/TLS certificates. However, it would break compatibility with existing plugins, so we would need to do the migration incrementally: first, add a Caddy server on top of Nginx, then remove services from Nginx, then remove the Nginx server altogether.
  • Replace Rabbitmq and Memcached by a single Redis instance: according to my preliminary testing, this should be possible and would simplify the platform architecture.

This is really the best thing that could happen to Tutor :slight_smile:

4 Likes

I would love to see EdX.org and the OpenEdX community align on as much shared code as we can. Edx.org runs openedx at significant scale, with dozens of daily code deployments and millions of active users. Therefore, our scalability needs are going to be applicable to most people who run openedx - we will make decisions based on our user request load and our budget that simply don’t make sense for anyone else.

One of those decisions is static assets. When we first started open sourcing our deployment codebase, we were building and deploying ec2 AMIs once a week. We didn’t have versioned assets, and our frontend code was relatively simple and generally backward compatible. So we decided to couple static assets with the application code in a single machine image. This simplified our deployments and testing. However, when we scaled up our deployments to happen multiple times a day, we discovered that this method of serving static assets directly from the application server was incompatible with versioned assets, blue/green deployments and our CDN. Long story short, the new application server would tell a user to get an asset: /css/somestylesheet-f287eea.css. The user’s browser would reach out to our CDN, which would reach out to the application servers to fetch the asset. However, 50% of those requests would hit an older server that didn’t know about that version of the stylesheet and return a 404. Frontend error rates were spiking to nearly 100% of all sessions with every release, and would continue until our CDN’s negative result cache expired. So edx.org ships static assets directly to S3 during our build/deployment process. This is only necessary because of the size of our userbase and the fact that we do a blue/green deployment 10x a day.

Another question is SSL - we use cloudflare, AWS ACM, and Letsencrypt to terminate SSL in our various environments. However, we do not re-issue SSL certs for each of our development environments because we would very rapidly hit the letsencrypt domain certificate limits. We therefore use wildcard certs in our dev environments so that we don’t have to re-issue certs. Moving from nginx to caddy doesn’t make sense for us.

So, much like Regis wants tutor to remain focused on generally useful features and configurations, we (edx.org) aren’t always going to be a good match for what is in the community’s best interests. I welcome requests for information, and I would love to figure out some ways we can share as much deployment code as possible. Our push to get more and more applications building docker images to https://hub.docker.com/u/openedx with a sane production ready default configuration is one of those. These are the docker images we are running in production in our k8s environment, and they’re also what we’re basing our next gen devstack and sandbox code on, and I’d like to keep that true for as long as we can.

Maintaining ansible code is burdensome for our developers and a hinderance to our ability to move quickly. We’re going to try to keep doing it for a little while longer, but I really hope Ansible isn’t a part of the “L” openedx release at all.

Please let me know what else we can do, or if there’s anything that we’re doing that’s not helpful.

Thanks @fredsmith for these detailed information! This confirms my impression: that edx.org and the rest of the community might have different deployment strategies, but they are reconcilable. I want to provide good defaults for most Tutor users, but I also want to enable more advanced users to replace some of the pieces of the deployment strategy by their own – like you do with edx.org. So in the end Tutor might not be a 100% good fit for edx.org, but I aim to make it a good starting point for system administrators who need both a scalable and easy-to-administer Open edX platform.

As I mentioned elsewhere, in my opinion, three things are missing for Tutor to be a good replacement for the native installation:

  1. A “bleeding edge” working deployment of the master branches.
  2. A clear transition path from the current native installation.
  3. Passing unit tests from inside the Tutor Docker images.

All three items are on the roadmap.

It was discussed in the last contributors’ meetup whether the group that decides on the future of the community installation should be part of the BTR working group. I think this project falls squarely within the attributions of the working group. It makes little sense to dissociate the release process and the testing of the actual release installation (IMHO). So we can discuss the progress of the “community installation v2 task force” (that’s a mouthful) during the weekly BTR meetups.

@arbrandes Please don’t hesitate to get in touch if you have any question about Tutor – but you know that already, right? :wink:

To give an alternative of a full migration to tutor, we could instead start migrating services to docker in the current edx/configuration repository, and have ansible install docker and start the services (We could probably have supervisord start the docker images).

This allows us to migrate one service at a time, reusing all the configuration already done by the configuration repo, and give us an incremental upgrade path for all services, as not all of them have a ready to use docker image right now.