Deploying Open edX on Kubernetes Using Helm

Hello @regis ,

The cookie cutter project by @lpm0073 builds the docker images pushes it to ECR by AWS using GitHub workflow. Everytime we build image it takes close to 40 minutes because it’s done by GitHub runners instead of doing it on physical device. Hence there is no cache. To avoide this we can run these workflows locally (from bastion machine) which means while running workflow next time we will have cache and take much lesser time. This helps in making smaller changes easier particularly while developing new features.

Regards.
Sujit Kumar

@regis,

  • verifying the objective: build openedx docker images faster given the constraint that build and deploy workflows run from Github.
  • there is no host. there’s a k8s cluster where the deployed applications run, and there are ephemeral nodes at Github where build and deploy workflows are executed.

I’m unfamiliar with Act but thus far at this early stage I understand that it would allow Github Actions to run on a local host. If true, that sounds like a pretty good idea to me.

I use act routinely when I need to test GitHub CI before pushing changes upstream. Act works by running a Docker container locally. But building a Docker image inside a Docker container is not trivial. I used to build the Tutor images in a Kubernetes cluster. But that was just too difficult: very often the nodes were running out of disk space. Upgrading Kubernetes clusters was also a pain. So I moved back to building images locally: for that I use the docker:dind image, and it works great so far – though the setup is a little convoluted.

I suggest you either run a self-hosted GitHub runner, setup Docker caching or figure out how to build Docker images remotely (with the dind images for instance).

thanks @regis. the self-hosted runners look very promising, especial given that these can run locally on Windows and macOS. i’ll read more about these and will probably have time later this week to try to prototype something in the Cookiecutter. i’ll report back after i know more.

i’ll heed your advise on Act.

Meeting 2023-01-10

Thanks to everyone for joining! I won’t be cc’ing everyone since we had quite a large cohort today :slight_smile: . Video recording to follow.

Cliff notes:

Transcribed meeting notes

Xavier

  • Introductions
  • Catch up from Xavier on what we’ve done so far for the newcomers.
  • Are there any areas that they’re interested on collaborating, etc?
  • Requested further recap from Felipe

Felipe

  • We decided to collaborate based on Braden’s thread
  • This evolved into this working group which is working on tutor-contrib-multi
  • Obviously with the different issues that it entails

Xavier

  • The configuration repo has been deprecated and that we were all working on the same thing. Is it possible to get something nice/maintainable.
  • Last month we started the review of the current approach and with specific steps to verify that the current approach works to validate that this is a good base to be working on.

Adam

  • Joined edX 4 years ago and researched deploying to EKS clusters.
  • Took the notes app and containerised it. It’s been the only service running in k8s for 2.5 years.
  • Was deployed using Kustomize. They needed better customization especially with respect to liveness probes.
  • It’s still closed source, but it’s running about 7 new Django api endpoints. Some are public, but not the rest due to license concerns, etc.
  • Considered using an umbrella chart like tutor-contrib-multi, but migration is difficult, especially codejail.
  • They’re still considering how they’ll be able to deploy all services using k8s instead of the old configuration repo.
  • Asked who is running codejail behind Flask?

Felipe

Adam

  • 2U will consider eduNEXT’s approach to codejail
  • 2U’s has mostly figured out autoscaling and can contribute to the effort.
  • Currently using Nginx, but is interested to what the community is using as it turn out to be useful.
  • Best practices for Kubernetes comes up within the org.
    • How to do liveness probes
    • Processes that have permission to write to disk
    • Etc.
  • There’s a [fairwinds article)[Kubernetes Configuration Benchmark Report)

Xavier

  • Are there things that we’ve worked on that 2U is interested in?

Adam

  • Codejail behind a Django (not a Flask API) would be great as they appreciate the consistency of it.
  • They’ve got an internal testing environment and will be tested from there.

Xavier

  • Meeting is too short, what about next steps?
  • A good step after this meeting might be for Adam to comment on any of the open tickets, like the codejail one.
  • Then we could discuss there and have async discussion.

Braden

  • Collabaration helps us all even if not directly using the helm chart, because everyone benefits from the small changes.

Felipe

  • Working group for Devops
  • How do we move codejail to a common/shared roadmap.
  • We could tackle multiple projects at a time instead of limiting ourselves to a single goal.

Regis

  • Created a new initiative for the Devops working group.
  • There are already projects that are devops related (three)
  • It doesn’t make much sense to have a working group for all of these projects.
  • Instead Ed proposed a Devops working group, where all the Devops related projects will live.
  • Each project can have each own Slack channel, leaderships, governance, rules.
  • Otherwise it can be handled within the Devops working group.
  • Github project GitHub - openedx/wg-devops: Issue repository for the DevOps Working Group

Adam

  • How does this differ from BTR?

Regis

  • BTR is more concerned with creating code releases and as such is distinct from Devops.

Xavier

  • What’s the approach to communicating with the Devops WG?

Regis

Adam

  • How to decide something fits within Devops or somewhere else?
  • How would we spin up/down working groups depending on the project?

Xavier

  • We try to take care of the issues for deploying larger instances.
  • Helm was a good starting point, but there could be changes in future especially with differences between small/large providers.
  • Is 2U interested in continuing the discussion?

Adam

  • Not sure.

Xavier

  • Adam mentioned some of the current issues that 2U is also phasing. Could be helpful to comment on the tickets.
  • Or on the forum?

Regis

  • Trying to push forward refined/groomed issues. Good first issues to engage folks to start working on them.
  • Can this project define important work to attract newcomers?

Adam

  • 2U is trying to figure out the best way to deploy Kubernetes. In terms of stability, etc.
  • There’s nothing on scaling yet. Expects that 2U can start there.
  • 2U can contribute, but will very likely not use the project for a while.
  • Internal helm chart is fully featured. Enterprise ready at this stage.

Daniel

  • Discussed with legal on open sourcing the helm chart.
  • Following up with them again this week.
  • Expects no opposition, legal just doing their due diligence in terms of license/contributor guidelines.
  • Should hopefully be done by the end of this month.
  • Best practices is really important.
  • Scaling/Liveness Probes/Metrics used for Scaling/Observability/Monitoring
  • All of the above are good areas for collaboration

Jeremy

  • Trying to think about how to smooth the learning curve to how to deploy in production.
  • We’re deploying to k8s, but not developing with k8s.
  • Hoping to find a way to have more consistency across environments.

Adam

  • Autoscaling

Braden

  • Lawrence would be the best person to speak to at the moment.

Lawrence’s mic wasn’t working.

Xavier

  • Go over issues. Discussed the Nginx + Cert manager task with Moises.
  • HPA with Jhony.

My sound dropped out here (low headphone battery), so I didn’t get the full conversation

Jhony

  • Talked a bit about Karpenter and approaches to autoscaling.

Xavier

  • Objections to next meeting at same time in 2 weeks.
  • Quickly went over the issues in the Git repo.
  • Further discussion to happen async.
3 Likes

@keithgg Thank you for the meeting recap! And here is the video recording (chat log):

Next meeting

I’ve sent a calendar invite for the the next meeting, which will be in 2 weeks, on 2023-01-24T17:00:00Z in this Zoom room. I’ve made it recurring to simplify the planning.

My apologies for not having properly followed the agenda for the last meeting btw, but I felt it was useful to hear from the 2U participants who have just joined us. This has delayed a bit the review of the status of the work we had scheduled for yesterday, as we didn’t have enough time to discuss it, but we’ll try to cover this asynchronously until next meeting, and dedicate more time to it during the next meeting.

Here is the proposed agenda for that next meeting - don’t hesitate if you see any changes or additions:

  1. Assign scribe role, greetings & introductions as needed. (5 minutes)

  2. Kubernetes, Tutor & Helm (40 minutes) - Debrief of the work from the tasks list & formal review

  3. DevOps Working Group (10 minutes) - Continue discussing the coordination with the parent DevOps working group, and the formalization of our group (as a “big/multi instances” subgroup of devops?)

  4. Next steps & conclusion (5 minutes)

Current tasks

Since we had very little time to discuss the tasks, the follow-up is happening async on the tickets:

CC @adzuci

1 Like

Hey folks, we need to figure out a new name for the project since it’s more about Open edX + Kubernetes + Multiple Instances than it is about Tutor, though it of course assumes use of Tutor for building container images.

Thoughts on any of these names? Vote for 1-2 that you like :slight_smile:

  • Baseline (openedx-k8s-baseline)
  • Catalyst (openedx-k8s-catalyst)
  • Harmony (openedx-k8s-harmony)
  • Ensemble (openedx-k8s-ensemble)
  • Common Hosting Environment for Containers on Kubernetes (Open edX CHECK)

0 voters

2 Likes

Hi ya’ll! Here is the recap from the meeting we held on 2023-01-24.

This was a short meeting going over the current list of tasks in the Github repo with some focus on
cloud-hosted development environments.

First PRs are ready for review, with discussion continuing on the created tasks.

Meeting notes

Braden

  • Thanks to the creators of PRs. Nginx has been reviewed, just shared Elasticsearch needs to be checked.
  • He will check it with eduNEXT
  • Please vote on the name changes for the project on the forum.
  • We will check up on the tasks.
  • NGINX work is basically done.

Jhony

Lawrence

Felipe

Braden

  • It only works with AWS. No reason to not support for the folks that want to use it.
  • Mentioned, OEP-45 and what the current status is.

Felipe

  • He is currently arbiter, but is also author. How does that work?
  • There hasn’t been proper support from outside the working group yet, so there’s an assumption of approval.

Braden

  • Is there a way to say it’s provisionally approved?

Felipe

  • Lets keep it a draft. Will change status and add comments.

Braden

Jeremy

Braden

  • Does it require less resources than on local machine?

Jeremy

  • Can we run some services on it or do you have to have to whole thing running is a question not yet answered?

Felipe

  • They had the issue of wanting to run other things on tutor, but not the LMS.

Jeremy

  • In Devstack it’s possible, but wonky since LMS is the source of auth.
  • With tutor it’s not yet ready.

Braden

  • Looks like a nice comprehensive review.

Felipe

  • Codespaces + Github is great.

Jeremy

  • We don’t want to be using a custom orchestration system, then using something else to deploy.
  • Eg. Using Docker/Python on the dev machine, but then deploying using K8s.
  • Preferably devs should be using the same dev environment to deploy.

Braden

  • Anything else that we want to chat about?

Felipe

  • They want to present at the conference a talk about crunching the Kubernetes numbers.
  • And some sort analysis of what they’ve found so far running large instances on Kubernetes.

Braden

  • OpenCraft can definitely collaborate and share some numbers. Sounds great.
  • Anything else?

Felipe

  • Regarding the monitoring, or are we interested in doing something that monitors the platform in a more useful mannner.
  • Most of us are using Prometheus + Grafana.
  • Which results in too much information sometimes that he doesn’t care about.
  • Wants to know Pod utilization vs heartbeat of LMS.
  • Is anyone else interested.

Lawrence raises his hand

Lawrence

  • Some guys at edX have made him aware of a product called Kubecost.
  • Had a convo with them last week, will be able to break down the cost service by service.

Felipe

  • Was thinking more of the values that he wants to monitor.
  • Database values/MySQL/Ingress values.
  • But going via cost is a worthwhile avenue.

Braden

  • We haven’t really looked at the cost, but it would be very useful.
  • Monitoring would be a really great thing to collaborate on.

Nothing else to discus. Meeting ends.

3 Likes

@keithgg Thanks for the notes and the recap! :+1:

@braden Is there a recording of the meeting? I didn’t receive one from Zoom, but it looks like the calendar was the host?

@lpm0073 @keithgg Did that discussion happen? It would be useful to post an update on the ticket - it looks like some pings don’t get through to you @lpm0073 ?

I also see further down the notes that @Felipe and @braden are also interested in collaborating on this. What would be the next steps on this?

@Felipe @braden To clarify here, was the decision that we are ok to move forward with Karpenter? Could this be mentioned explicitly on the ticket at Karpenter? · Issue #7 · openedx/tutor-contrib-multi · GitHub ? And did someone volunteer to work on it?

Btw, there are a few questions on the topic of auto-scaling in Add autoscaling · Issue #2 · openedx/tutor-contrib-multi · GitHub that could use a few eyes maybe? Were those topics discussed during the meeting? To be able to move forward, a decision would be useful? CC @jhony_avella

@Felipe Since the formal review period is over, it would be better to merge it provisionally - this way it becomes accessible at Open edX Proposals Index — Open edX Proposals 1.0 documentation and it’s clear that it’s the way we have agreed on for now. Then when refinements are done, we can always update the OEP. It makes it easier for others to open PRs against the document, too.

Yes, both are moving forward. I’ve commented on the first one to follow-up with @regis . For the second one, we need those who want to have access to the repo to mention it on the ticket.

@jmbowman +100 to this! I have created this ticket on the repo to track the collaboration/support work on that end: Support for the Cloud-based developer environment · Issue #14 · openedx/tutor-contrib-multi · GitHub

@Felipe @braden Good idea to discuss this at the conference! It could be an occasion to advertise our work, to attract more contributors from the rest of the community?

I have created a ticket to track this: Conference presentation about Kubernetes on large instances · Issue #15 · openedx/tutor-contrib-multi · GitHub - I’ve also asked there about who is going to be presenting?

@antoviaque Unfortunately none of us could figure out how to do a recording. It said we didn’t have permission. I’ll have to learn what it means if the calendar is the host; I assumed it was you.

That’s what we actually decided on the call, just used the wrong word. See update from @Felipe here: OEP-45 :: ADR-002: Deploying Open edX on Kubernetes Using Helm by bradenmacdonald · Pull Request #372 · openedx/open-edx-proposals · GitHub

We looked at the ticket very briefly during the meeting. I posted an update on the ticket just now. IMHO this one is not urgent so there is no need to decide now while some of the questions of scope are still a bit fuzzy. I’m hoping we’ll get more context to make such a decision in the future after other pieces have fallen into place.

Yes we discussed it and @lpm0073 is still planning to do it.

We looked at it briefly and decided to follow up async so we have more time to consider it.

I’ll reply there.

2 Likes