Tech talk/demo: Deploying multiple Open edX instances onto a Kubernetes Cluster with Tutor

The idea behind removing Redis from the cluster is that it is a stateful component, and the volume is by default in the node’s storage.
During failover tests, I’ve found that if you kill a node, a new node is generated, and all lost pods are recreated in the new node. However, stateful pods may fail due to node affinity if the new node is in a different AZ from the original. The final solution for this is to use NFS volumes, but this is too complicated for such a simple deployment. That’s why we preferred to remove Redis from the cluster. Maybe you have a better solution for this!
Besides Redis, the only stateful pod is Caddy. For this we found that using an Emptydir volume fixes the problem, but I’m still in doubt.

I think that we should have an official reference architecture that meets all the best practices for a full scale production environment. As you said, then each one can choose to what extent to implement. But at least we should depict the best practices and desirable characteristics of such a deployment (security, resiliency, scalability, etc.) and how to achieve them.
AWS has the well-architected framework that can be used to start. It is quite general, and can be applied to any cloud or on-premise

1 Like

Ah, I see. We don’t treat Redis as a stateful component. While Redis can be configured in a manner which allows for some stateful guarantees, we do not depend on these for production use and don’t recommend anyone put anything within it they are unwilling to lose. So that explains the different approach.

1 Like

@andres do you use Terraform or CloudFormation, or something else to manage your infrastructure? Since it seems like we’re going to work with @lpm0073 to see if we can adapt our Terraform repository to cover his use cases as well as ours, I’m wondering if it would be possible to also adapt it to be able to provision your reference architecture as well.

I use aws CDK. I know nothing about Terraform, but it shouldn’t be difficult to learn. I wonder if Terraform can deploy any kind of aws resouce. It would be great if we all can join our efforts in one solution that’s better for all.

1 Like

I have been trying to follow this discussion as it develops, but It was going faster than I could focus on it to reply with some of our thoughts from edunext.

Both this thread and the cloud architecture for aws are very profound and the collaboration that is emerging is great. However I sense that it is focusing precisely on the part of the issue that will be more difficult to agree on.

Things like:

  • using the same gitlab, github, bitbucket or any other git service + CI combination
  • tenancy decisions: one install per cluster, install per namespace, multiple tenants per install
  • the infrastructure we will run the openedx services in (aws, digital ocean, azure, …)
  • how to sync the k8s manifest with the cluster

Also in my view, although the infrastructure provisioning can be a painful thing. It most often ends quite rapidly and you are good for a relatively long while. This is even more so once the switch to k8s is made. Provisioning the cluster is a one time thing, but keeping up with the manifest and what is inside of the cluster is what requires attention for longer. This was the initial idea we started with when we set up to develop shipyard at edunext.

I might as well be wrong on this and the push to make a reference installation starting from the cloud components and using terraform might be the way to go. I won’t stand in the way and we will undoubtedly collaborate on it and bring the experience that we have gathered to it. However I’d like to comment on a different approach. We have for years already agreed on good practices and patterns that we can leverage on for this:

  • open/closed principle for architecture.
  • tutor is a great software and it does it job very well
  • kubernetes is a solid choice(even if some installations decide to use something else for orchestration)

Also we all know how big and unruly configuration grew up to be and we don’t want to follow the same steps with the main difference being that the code is split in a bunch of repos instead of one.

What I’m thinking here is that we could take some lessons from the governance of the big and complicated repos such as edx-platform and set up a project where we share our common experience of hosting and maintaining openedx installations specially those of larger sizes.

This piece of code should be very much oriented to ease of extension, but also it would have a well defined list of maintainers for the different parts of the core. The main goal of this project would be render files that can be used as a manifest for k8s that hosts open edX. How you apply this manifest to the cluster is everyone’s own choice. It could be using a hosted CI or with tutor k8s init. This would also be covered by the same ease of extension policy.

This is quite a raw idea and if was reading this I would be like “sounds nice, show me the code or I’d call bs”. Just throwing it here to see if there is any traction. I’ll get started on a prototype to prove to myself that this is actually worthwhile.

Further considerations:

  • I’m talking about a flexible rendered and tutor is an amazing and extendable template and file manager. It follows naturally that the piece of code we write be a tutor plugin. What I’d like is for this project to lift the burden of supporting the issues that arise during operation tutor. If this is a non issue (which I don’t think it is) and we are better served by giving better support to the tutor plugins individually. Please say so.

  • this project would probably be better off living on the openedx org so that there is no space for provider egos anywhere. It should also be covered by the core-commiter program.

cc @jhony_avella @MoisesGonzalezS @mgmdi

4 Likes

I wanted to share a general update to this audience. This afternoon i released v1.0.0 of the Cookiecutter, the first general production release for fully automated build & deploy onto AWS. I have a continued interest in porting this code base for Digital Ocean, GCC, Azure et al as well as other popular CI platforms.

Related: the original Github Actions build & deploy workflows have been refactored into a collection of reusable components that you’ll find here: https://github.com/openedx-actions. These components greatly simplify working with Open edX on Kubernetes and most are designed to work independently of the Cookiecutter itself. Today I also promoted several of these actions to v1.0.0. See the README’s on each regarding any details that might bear on your decision of whether of not to incorporate these into your projects. The following actions are now available for general production use:

@andres “any kind of aws resource” is a very broad statement, however, i can confirm that to date i’ve been able to find high-quality vendor-supported Terraform components for everything that i needed for the Cookiecutter automation of building the AWS environment, which includes: VPC, EC2, EKS, ECR, RDS, IAM, S3, Elasticache, Certificate Manager and Route53.

@braden @lpm0073 @andres @gabor @keithgg @regis @jhony_avella @Felipe To accelerate a bit the process, should we do a synchronous meeting, with the goal of deciding on the definition of a shared project scope for this, that we would all agree to adopt and maintain together?

Here is a poll to find a time to do this – if you would like to participate, please add your availability :slight_smile: Try to select the maximum amount of slots, to help to find a common time:

The results will be there.

3 Likes

hello all, apologies to all for my absence. I’ve been down with COVID for the last couple of weeks, but feeling better now :slight_smile:

3 Likes

@antoviaque great idea! I just added my availability.

1 Like

Thank you for filling the availability poll! :+1:

It looks like we have two slots where everyone who has answered is available – let’s do July 26th at 16:00UTC? That’s right after the contributors meetup.

Zoom URL to join: Launch Meeting - Zoom

See you there! :slight_smile:

3 Likes

To follow-up on the latest meeting, should we plan a new one for 16th of August?
cc: @Felipe @antoviaque @lpm0073

Thanks to those who have attended this meeting. It was really nice to see everyone, and the conversation was fruitful!

Attendees

@MoisesGonzalezS @Neo @gabor @jhony_avella @Felipe @braden @andres @lpm0073 @keithgg @antoviaque

Action items

Resulting from the discussions:

  • @Felipe will look into Helm charts - shared helm charts to deploy Open edX on Kubernetes looks like a good option to collaborate with. He will post his review of it within the next couple of weeks
  • If this confirms that it would be worth using to try to work together, @Felipe and @braden will work out a concrete proposal document together, for shared helm charts to deploy Open edX on Kubernetes, which will then be reviewed by the group.

Recording

Chat log

00:07:40 Felipe Montoya: Miro | Online Whiteboard for Visual Collaboration
00:44:23 jhony: GitHub - eduNEXT/drydock
00:53:12 Gábor Boros: libdjango/templates · main · opencraft / Operations / OpenCraft Helm Charts · GitLab

2 Likes

We should definitely plan for a follow-up meeting – but maybe after we have completed the async steps decided during the meeting, and done at least one async pass of review on the document that will result from it? Synchronous meetings can be useful to resolve discussions that take a lot of back & forth – but for meetings to be useful there should be work & async discussions between them?

@Felipe @braden When do you think you will get a first draft up for review?

The inmediate next step was actually on me. Here is un update:

After our meeting I started my helm investigation enthusiastically and I think I have reached a point where its clear that writing helm charts for the openedx project is something that we at edunext would be very interested in.

Now, I suppose that we can write some plan together @braden. I’d use this place to leave some ideas that I think would be ideal to tackle or general concerns I have.

For the project

  • we need to make it flexible and modular
    • many composable libraries?
    • put everything behind flags?
  • get started with the project already adopting oep-55
  • publish the charts to artifacthub.io

The community politics

  • make it in a provider agnostic place? e.g the openedx org
  • use the core contributors program to handle participation
  • would we target this to eventually be a reference installation as per oep-45?

Learning from the configuration project

  • owner file or equivalent. I think this was the largest issue we found, trying to modify a role who’s maintainer or main user was not clear
  • public roadmap from the start
  • we will not be able to merge and maintain every thing that operators might want. The project should make it easy to extend this in other ways
  • sometimes the preprocessing of the values was the key element of a feature, but this was only a secure data thing that was never published. E.g: embargo CIDRs

Best practices

  • everything we have come to expect of an openedx project:
    • conventional commit
    • lots of CI actions to update, test on PRs
    • high test coverage. How is this achieved in helm
  • what are HELM specific best practices that we should adhere to?

I’ll be happy to work sync or async in the next step.

4 Likes

@Felipe That’s great! I’ll ping you early next week and we can start writing up the plan :slight_smile:

2 Likes

FYI, the plan for this is being formalized in OEP-59:

@braden @lpm0073 @andres @gabor @keithgg @regis @jhony_avella @Felipe @MoisesGonzalezS @Neo

To follow-up on the discussions in OEP-59: Deploying Open edX on Kubernetes Using Helm by bradenmacdonald · Pull Request #372 · openedx/open-edx-proposals · GitHub and get everyone up to date, would it be useful to schedule a follow-up to our last synchronous meeting? When we meet it also usually boosts the progress and decision-making, too :slight_smile:

And also, discussing with @Kelly_Buchanan this week we realized that there isn’t currently a good community working group to discuss this type of collaboration, and many other DevOps topics. The closest is the build-test-release working group, but it has its hands very full with managing the releases, and is understandably focused on it.

Maybe our current group could be a good starting point for a more general DevOps working group? From past experiences in other areas of Open edX, having regular meetings helps to find ways to collaborate more, and keep the momentum of initiatives going. It could also help liaising more with people from DevOps at 2U, which was what prompted this realization in the discussion with @Kelly_Buchanan – as 2U doesn’t use at all the named releases, the build-test-release discussions are often pretty far from their concerns.

5 Likes

Count me in the meeting! :blush: