Let's talk about the native installation

The Open edX native installation is currently the only installation method that is officially supported by edX: see docs here and in Jira.

As everyone here knows, the native installation suffers from quite a few major issues. I’ll just name two here:

  1. A very high level of complexity, which makes it very hard to contribute to the edx/configuration repository.
  2. No upgrade method for existing releases.

EdX acknowledges these issues and has taken steps to address them. It was decided in OEP-45 (also see PR) to move to a Docker-based approach for installing Open edX :whale:.

I think that the community needs to contribute to this effort: whichever method edX decides to use to deploy Open edX, it is very likely that the rest of the community will not want to use exactly the same installation method. Also, I don’t want to speak for edX, but it seems to me that they will not be interested in maintaining a repository which they do not use.

Thus, I think that the Open edX community should step forward and propose an implementation of a Docker-based production installation for Open edX.

In case it wasn’t clear, we are looking for volunteers here :slight_smile: :raising_hand_woman: :raising_hand_man:

Now of course I have proposed a Docker-based installation method for Open edX for a few year now, and that is Tutor. It’s robust, documented, extensible, scalable and supports one-click upgrades. If I had to do it all over again, I wouldn’t do anything differently. So I’m not going to step forward and propose a new Docker-based deployment method. But we need to acknowledge that this implementation exists and could be proposed as the new native installation for Open edX. I understand that Tutor as its limitations: for instance, it does not support deployment of the Open edX master branch, yet – but this is being worked on in the edge branch of the tutor repo. Also, Tutor does not have a clear transition path from the current native installation – although it’s been done before.

Work by edX has already started to propose a Docker-based installation method. For instance, the edx-platform repo now includes a production-ready Dockerfile. The Open edX community may want to wait until edX releases their own deployment implementation, as it was agreed here, but the community should step in early and discuss what is the best deployment strategy for them.

What do you think? Are you interested in contributing to this effort? Are there arguments against Tutor being the officially supported native installation? I’d like that the BTR working group has this conversation to keep the topic open and move towards a solution.

10 Likes

Hey,

We have been deploying Open edX to production using Docker for quite some time using tutor (The early releases from a year ago) as a base for the images (Using github actions for CI [1]), and we would love better integration with the base installation, and contribute what we can.

The most difficult part of a docker installation is distributing the static/configuration files and the CodeJail, which require more attention to deploy securely. Right now edx/configuration is hard to deploy in multiple machines, and standardizing the storage to use S3/S3 compatible software would help a lot in distributing the application to more servers and offload the LMS (And prevent the pain of migrating from FileStorage to S3 when needed).

We just migrated our storage to use Minio (s3boto3) as a S3 provider last week (We are running bare metal, and not using tutor for this right now), as we are moving the applications to kubernetes, and we can’t use FileStorage in a distributed setup (We don’t want to use NFS for file storage, as S3 is used by default by edX), but moving to S3 helps a lot in a distributed scenario (reports and exports are now signed and not open to the world for example). We are also moving the static files to Minio (Already on staging), so we can decouple the static storage, and fix some problems we have when deploying new versions (The hashes change on deployment, and we need both versions of the static files to be available, which we can do by copying the new static files to the s3 bucket, deploy the new version, and then sync/delete the old files).

Doing everything with tutor would help a lot in preventing some of the problems we face right now, and it will help to have working/reproducible builds that we can rebuild on demand to test if old releases are working, while also having a backup of working images of them.

Best,
Felipe.

[1] GitHub - eol-uchile/eol-edx

3 Likes

Hi Régis,
I fully agree that Open edX should have a renewed setup for production environment. I believe docker + kubernetes are state-of-the-art and suitable now for production, so for sure they should be considered. I am not an expert in these technologies, but I can help in thinking the principles of an infrastructure architecture. The design should keep scalability in mind. Components like LMS, mysql and mongodb have VERY different behaviour in its infra (e.g. disk access, memory, cpu, persistence vs. stateless, etc.), so we should decouple them and design scalable infra for each. Let me know how can we collaborate to arrive to a recommendation.

1 Like

Hi @regis, thanks for bringing this up!

In response, we’ve discussed this internally at OpenCraft and decided to evaluate whether both using Tutor for our projects and pushing it as the de facto reference implementation (one is dependent on the other) make sense for us. We’ll begin playing around with it seriously over the next couple of weeks, after which we’ll probably have lots of questions and suggestions.

To start with, though, let me bring up a couple of things that are important for us, and we’d like to hear from you (and the community), what you think:

  • How’s master support coming along? How easy will it be to put it under CI/CD whenever a new edx-platform@master commit comes down the pipes (so we can fix it if the alert is sounded)? We deem this to be an essential first step before considering it as a (or the) reference implementation.
  • Do you think it is reasonable to expect Tutor to be able to use the same Dockerfiles that edX will (reportedly) provide and maintain as a result of OEP-45? In an ideal world, even if edX don’t make their own orchestration configuration public, it would be best for the community if everybody’s using similarly constructed container images.
  • Kubernetes is the only sane container orchestration engine that scales. Do you envision Tutor not only continuing to support it, but expanding support as the need arises?
  • Last but not least (for now), we’re concerned about governance: edX has recently come out with the Core Commiter program, of which you’re one of the first few (congrats, btw :). Would you be willing to consider something similar for Tutor?

Note that if we end up pushing forward as suggested, you’ll probably be getting plenty of contributions from us on all of the above areas. (Fixing master when it breaks, contributing a way to factor theming out into a volume, etc.) The idea being that more people start to do the same, hopefully. :slight_smile:

3 Likes

Thanks for your detailed comment and your enthusiasm Adolfo! It would certainly be great to count Opencraft among the Tutor users and contributors.

First, let me clarify something: a few people messaged me saying “I saw your post about Tutor becoming the new default community installation”. My message was unclear: although I would be thrilled to see Tutor become the default recommended installation method, I am not pushing for Tutor to replace the native installation. Instead, what I am saying is that a task force should already start working on a default, community installation of Open edX that implements OEP-45. This task force might end up concluding that Tutor is the right choice (and again, I think that would be great), but people need to start thinkin about this – because edX is not going to do the first move.

That being said, let me try to answer your questions.

I made a proof-of-concept of running the edx-platform master branch in the edge branch of the Tutor repo. In the future, I expect that developers who want to run run or test the Open edX master branches will run from this edge branch. I’m not sure yet whether we will provide images for this branch.

It’s not going to be trivial, but it should be easy enough. The current CI is based on a Gitlab instance and runs on Kubernetes. Both Gitlab and K8s are self-hosted, so it should be scalable and configurable at will. Tutor also uses the Travis CI but this is just for building the Tutor binaries, and I don’t expect to use it for other tasks.

I understand the point, and this would be possible, but I don’t expect this will be the most reasonable choice. The Docker images are the central pieces of the puzzle, so Tutor needs to have full control of how they are generated. For instance, I went to great lengths to generate static assets in the Docker images in a reliable and efficient way. The edx-platform Docker image simply runs paver update_assets which I found does not work well.

Another difference is how Tutor serves static assets: Tutor recently switched from Nginx to Whitenoise and it works great. (And yes, it scales well :slight_smile:) It’s those small differences that add up and make edX’s and Tutor’s Docker images quite different.

That being said, if both images become sufficiently well aligned, I don’t exclude the possibility that one will inherit from the other. It’s just that I don’t expect this will happen any time soon.

I’m not sure I agree with your statement (especially the “sane” part), but yes Tutor will continue to support the Kubernetes deployment target. I’m not very strong with K8s operations so K8s is not my priority at the moment. For instance, I did not give enough attention to this PR which improves K8s support but seems to break the docker-compose compatibility.

Currently, Tutor is a thin wrapper on top of basic K8s manifest files which can be deployed with kubectl apply. This makes it easy to extend and customize an Open edX platform for K8s administrators. For instance, I don’t think it makes sense to provide a one-click installation for Kubernetes. It’s reasonable to expect that K8s administrators know what they are doing and have the skills to perform slightly complex operations manually.

However, I think it would make sense to develop a Tutor operator for Kubernetes: a sort of CD operator that would automatically build, push and re-deploy images as changes are pushed to the various repos.

So to answer your question: yes, I’ll be happy to extend support for K8s but we need to talk about what we want to add :slight_smile:

Absolutely. While one person is enough to develop and upgrade Tutor and all its supported plugins, it becomes near impossible to do this just by myself when we need to keep the master branch running daily. So I would need help from the community to maintain the edge branches of Tutor and its plugins. However, I want to keep a veto right for all new features: a Tutor-deployed Open edX platform should remain lean and simple, so I’m uninterested in pull requests that add features to be used by a small minority of users. If someone wants to do things differently, the best approach is that they develop their own plugin. Of course, if something in the Tutor codebase prevents them from developing such a plugin (such as a missing {{ patch(...) }} statement) then we’ll try to fix that for them.

Also, we would discuss the short-term roadmap with the other maintainers. For instance, here’s what I have in mind as the next major evolutions:

  • Replace Nginx by Caddy: this would make it much more simple to handle SSL/TLS certificates. However, it would break compatibility with existing plugins, so we would need to do the migration incrementally: first, add a Caddy server on top of Nginx, then remove services from Nginx, then remove the Nginx server altogether.
  • Replace Rabbitmq and Memcached by a single Redis instance: according to my preliminary testing, this should be possible and would simplify the platform architecture.

This is really the best thing that could happen to Tutor :slight_smile:

4 Likes

I would love to see EdX.org and the OpenEdX community align on as much shared code as we can. Edx.org runs openedx at significant scale, with dozens of daily code deployments and millions of active users. Therefore, our scalability needs are going to be applicable to most people who run openedx - we will make decisions based on our user request load and our budget that simply don’t make sense for anyone else.

One of those decisions is static assets. When we first started open sourcing our deployment codebase, we were building and deploying ec2 AMIs once a week. We didn’t have versioned assets, and our frontend code was relatively simple and generally backward compatible. So we decided to couple static assets with the application code in a single machine image. This simplified our deployments and testing. However, when we scaled up our deployments to happen multiple times a day, we discovered that this method of serving static assets directly from the application server was incompatible with versioned assets, blue/green deployments and our CDN. Long story short, the new application server would tell a user to get an asset: /css/somestylesheet-f287eea.css. The user’s browser would reach out to our CDN, which would reach out to the application servers to fetch the asset. However, 50% of those requests would hit an older server that didn’t know about that version of the stylesheet and return a 404. Frontend error rates were spiking to nearly 100% of all sessions with every release, and would continue until our CDN’s negative result cache expired. So edx.org ships static assets directly to S3 during our build/deployment process. This is only necessary because of the size of our userbase and the fact that we do a blue/green deployment 10x a day.

Another question is SSL - we use cloudflare, AWS ACM, and Letsencrypt to terminate SSL in our various environments. However, we do not re-issue SSL certs for each of our development environments because we would very rapidly hit the letsencrypt domain certificate limits. We therefore use wildcard certs in our dev environments so that we don’t have to re-issue certs. Moving from nginx to caddy doesn’t make sense for us.

So, much like Regis wants tutor to remain focused on generally useful features and configurations, we (edx.org) aren’t always going to be a good match for what is in the community’s best interests. I welcome requests for information, and I would love to figure out some ways we can share as much deployment code as possible. Our push to get more and more applications building docker images to https://hub.docker.com/u/openedx with a sane production ready default configuration is one of those. These are the docker images we are running in production in our k8s environment, and they’re also what we’re basing our next gen devstack and sandbox code on, and I’d like to keep that true for as long as we can.

Maintaining ansible code is burdensome for our developers and a hinderance to our ability to move quickly. We’re going to try to keep doing it for a little while longer, but I really hope Ansible isn’t a part of the “L” openedx release at all.

Please let me know what else we can do, or if there’s anything that we’re doing that’s not helpful.

1 Like

Thanks @fredsmith for these detailed information! This confirms my impression: that edx.org and the rest of the community might have different deployment strategies, but they are reconcilable. I want to provide good defaults for most Tutor users, but I also want to enable more advanced users to replace some of the pieces of the deployment strategy by their own – like you do with edx.org. So in the end Tutor might not be a 100% good fit for edx.org, but I aim to make it a good starting point for system administrators who need both a scalable and easy-to-administer Open edX platform.

As I mentioned elsewhere, in my opinion, three things are missing for Tutor to be a good replacement for the native installation:

  1. A “bleeding edge” working deployment of the master branches.
  2. A clear transition path from the current native installation.
  3. Passing unit tests from inside the Tutor Docker images.

All three items are on the roadmap.

It was discussed in the last contributors’ meetup whether the group that decides on the future of the community installation should be part of the BTR working group. I think this project falls squarely within the attributions of the working group. It makes little sense to dissociate the release process and the testing of the actual release installation (IMHO). So we can discuss the progress of the “community installation v2 task force” (that’s a mouthful) during the weekly BTR meetups.

@arbrandes Please don’t hesitate to get in touch if you have any question about Tutor – but you know that already, right? :wink:

To give an alternative of a full migration to tutor, we could instead start migrating services to docker in the current edx/configuration repository, and have ansible install docker and start the services (We could probably have supervisord start the docker images).

This allows us to migrate one service at a time, reusing all the configuration already done by the configuration repo, and give us an incremental upgrade path for all services, as not all of them have a ready to use docker image right now.

As some of you already know, I’m currently working on evaluating Tutor as such a replacement, and while I’ll soon be able to add a few more items to this wishlist, these 3 items are exactly the most important. :slight_smile:

I was about to ask whether we should have a separate meeting and found this post, so I guess that basically answers it. I’ll start attending the BTR meetups with the express purpose of discussing the community installation and my Tutor findinngs.

Yup, I know, but thanks for reiterating that. :slight_smile:

Felipe, I’m not sure using edx/configuration for anything will be a good idea: there’s every indication that it will soon be deprecated entirely. But we’ll see: I’d love your input in the next BTR meetup on the 7th.

I suggest that there is at least one iteration (it could be two) of an Open edX release that will be able to run with the current native installation and the future community installation. Let’s call it a transition period. And after the transition period, only the community installation should be available.

We also need to make sure that the future community installation can be installed not just with the Open edX branches at edX but also with branches from forks of these repositories like ours at EDUlib or at eduNEXT or FUN for example. I know Tutor already supports that as I was successfully able to it the last time I tried it.

Good point, Pierre.

Also an excellent point. I’ll add these two to the list of things a solution needs to support. Thanks! :+1:

(I considered opening up a new topic, but I figure this is where the conversation is happening, so let’s keep it that way.)

During the contributor’s meetup earlier today, a few important topics were raised about the future of the community instalation. Here’s the rundown of the two main ones:

  • @nimisha reports that edx-configuration is already undergoing de facto deprecation internally at edX. She also suggested we talk to @Cory_Lee about how this is taking place: we (as a community, I mean) might be interested in finding out how edX is going about this. We might want to use some of the same tools. I’ve invited Cory to the next westerly BTR meeting, next Monday.
  • @antoviaque, @nimisha, and others raised the point that there are several efforts around containerization, testing, and CI/CD that are (or should be) entwined with the community installation, however it ends up coming about. This needs to be tracked somewhere. Currently, we have this thread, the BTR meetings, OEROADMAP-21, DEPR-122, and BTR-43, all with slightly different but overlapping purposes. I suggest we rally around DEPR-122, which if nothing else will give us all the motivation to get this done. :slight_smile:

I’ll be building out a list of parameters that a community installation must adhere to as part of BTR-43. While it’s specific to Tutor for now, if for some reason it turns out that Tutor’s not the best tool for the job, we can morph it into a more general task later.

Whatever the case, feel free to contribute thoughts on what should and what shouldn’t the community installation be right here. (Much like how @sambapete did above. :slight_smile:)

7 Likes

It would be great if the community installation could decouple some key elements, in order to allow deploying into different virtual hosts:

  • Application: edxapp and all stateless code that can scale horizontally
  • RDB
  • MongoDB
  • File storage (s3 or any other mounted FS that can be shared among app instances)
  • RabbitMQ

Then there should be a means to configure where those elements are, allowing both installing everything in the same host, or in different ones.

3 Likes

So, as promised, here’s the video from Cory’s brain dump on the future (and present!) of edx-configuration:

And these are the notes, which are still being improved upon. If you have anything you feel we missed from the meeting, feel free to contribute!

The short-short version as it pertains to this topic is:

  • The exact timeline for full deprecation of edx-configuration is still undetermined: edX still uses it to generate its edx-platform master images
  • These master images contain an alternate way of managing yaml/json configuration that does not require a full ansible run when changes are made, called RemoteConfig/Hermes
  • Hermes is not the future: edX is slowly moving to Kubernetes and Helm - but hasn’t yet for the edx-platform IDAs
  • edX won’t be publishing the Helm charts it uses in production, but may consider maintaining a sample version, if it is deemed necessary by the community
  • Dockerfiles for each IDA will be provided in each repo
  • There is a desire by both edX and the community to unify so that as many people and organizations as possible are using the same upstream codebase - but as far as edX is concerned, not at the expense of the freedom to choose their own internal deployment strategy
2 Likes

After discussion in today’s contributor’s meetup, it was agreed that the best way to submit my findings on replacing edx/configuration with Tutor would be as an ADR under OEP-45. So here it is, finally:

I urge everybody even remotely interested to take a look and comment. This is our chance to rally around (or shoot down! :slight_smile:) a concrete proposal, one that with your help can be made good enough for all of us.

Tagging @regis, @nimisha, @nedbat for awareness. I’m sure you’ll have lots to add to the discussion.

3 Likes

I’m generally in support of this ADR, and I think it’s the direction we’ve internally been hoping this would go.

Edx.org is currently automatically publishing production ready docker images at https://hub.docker.com/u/openedx for most of our applications. The intention of publishing these images is as both a basis for community docker/k8s deployments, as well as our own k8s based deployment. We are actively running many of these images in our production environment today, and working towards a future where all of our applications on edx.org deploy via docker images in k8s. I, personally, would love to see a way that these images can be used in tutor, and am hopeful that we can collaborate on them.

Edx’s k8s deployment is complex and our attempts to use tools like helm to simplify and open source it have been largely unsuccessful. There are a lot of moving parts and a lot of complicating factors that make it difficult to generalize, the way we did with edx/configuration and ansible. I don’t think that is something we’re willing to commit to.

As far as maintaining edx/configuration, we are still using it internally for sandboxes and many of our production deployments. Since we’ve yanked out all of the application configuration into remote config and the production.py django environment, there are now very few updates to edx/configuration by edx developers to support new features in the platform. I anticipate that it will continue to function for at least a few more releases just because so much of it’s functionality has been “outsourced” to files within the application repos that are used by both docker and ansible base deployments.

I had hoped that we would be further along in this process by now, both internally as a community, but 2020 has not exactly gone to plan for anyone. I’m hopeful that we can make some progress towards faster, more reliable deployments, and a more approachable platform for the community.

5 Likes

We’re going to have a chat with edX folks on 2021-02-04T20:00:00Z specifically about using Tutor as a Devstack, using Criterion 7 in the ADR as a basis for discussion. Readers are welcome to join, though if you can’t, fear not: the meeting will be recorded and the video posted here.

Gcal event link
Build-test-release gcal link (ical version)

3 Likes

Here’s the video (and chat log) of the Devstack Pow-Wow we held yesterday. Big thanks to all who attended!

The brunt of the meeting was about clarifying what Tutor can, can’t, and might be able to do, but here are a few highlights of the discussion that ensued:

  • The so-called “distributed devstack” project was put on hold by edX after some experimentation, but there are plans to improve documentation in the official devstack
  • edX will start investigating Tutor as a development environment for internal use; “why develop two tools to solve the same problem (caveat: if it’s indeed the same problem)”
  • @regis will start working on a way for Tutor to support the master branch of edx-platform - once this is ready, we’ll notify @feanil so he can try it out
  • The question was raised as to whether it would be possible for Tutor to use the same Dockerfiles as edx-platform: no conclusion was reached, except that it doesn’t seem to be impossible and warrants further discussion
  • @Diana_Huang’s cat wanted to eat her webcam at around 00:29:29

IMHO, a very good meeting. 10/10, would do it again.

4 Likes