Deploying microfrontends (MFEs) in a single container

As part of my recent investigations into mass-deployment of Open edX with Tutor and thinking about how we deploy MFEs, I want to propose a method for deploying them on Docker that I haven’t seen discussed yet.

Challenge:

  • There are already around 20 MFEs, and there are going to be more.
  • If I deploy 100 Open edX instances onto a Kubernetes cluster, each with a different theme, and each instance has ~40 MFEs, and each MFE requires a custom image to be themed, that would require 4,000 containers running just for MFEs, each with a different image.

I don’t think that’s an ideal approach, especially for those of us who deploy dozens or hundreds of instances.

Proposal:

The openedx-frontend Docker container would be a new docker image versioned and released along with other Open edX components. This one container image would contain all standard Open edX MFEs, Node.js, and a webserver (like nginx) to serve them statically at different URL prefixes / virtual hosts. Each MFE has its own node_modules and the image is built with all dependencies for each MFE pre-installed, but no static files generated. So the layout of the image is something like this:

/edx/frontend/frontend-app-learning/
/edx/frontend/frontend-app-learning/node_modules/    <-- pre-populated
/edx/frontend/frontend-app-learning/dist/            <-- empty at first
...
/edx/frontend/frontend-app-gradebook/
/edx/frontend/frontend-app-gradebook/node_modules/   <-- pre-populated
/edx/frontend/frontend-app-gradebook/dist/           <-- empty at first
...

The entry point of this container would be a script which:

  • checks environment variables to determine which MFEs should be enabled
  • for each enabled MFE, runs npm install @edx/brand@$FRONTEND_THEME_PACKAGE and npm run build (all in parallel); and then
  • starts nginx

What this gets us is a single image that can be used by almost everyone to deploy whichever subset of MFEs they want to use, with custom branding. Because the entrypoint builds the MFEs, whatever environment variables you want (site name, LMS URL, brand/theme, feature toggles, etc.) only need to be set once for the container and will be applied to all MFEs.

The pros of this approach as I see it are:

  • You would rarely need to build a new container image; everyone in the community could use the exact same openedx-frontend image (versioned for each named release)
  • You only need to have a single frontend container running for each Open edX instance, and it has a single set of environment variables to control theming, feature toggles, LMS URLs, and so on
  • Doing things like changing the theme, changing feature toggles, or enabling a new MFE is as simple as changing an environment variable and restarting the container / rolling out a new version
  • A kubernetes startup probe would ensure that as you roll out new configurations/versions of the frontend image, the previous version is still used until the new version is fully built.
  • It’s easy to put a CDN in front of the container for faster performance
  • It provides a forcing function to encourage all MFEs to use standardized environment variables for configuration.

The cons are:

  • The container takes a while to start up during initial deployment, because it has to use webpack to build each MFE (though they can build in parallel on as many CPUs as your k8s nodes have). However, you have to do this build on initial deployment of a new Open edX instance in any case. This is only really different than other approaches if you accidentally shut down your frontend container(s) in production and you weren’t using a CDN; in that case it would be much slower to start back up.
  • The container is not entirely immutable, and because it installs a theme package in its entrypoint, it’s possible that it could display changing behavior depending on how you specify the version of the theme package.

However, for anyone who needs the extra reliability and startup speed of a fully immutable container, you have two very simple choices that fully remove the cons: (1) create a derived image from the base container (after every MFE has been built with your customizations) and deploy that, or (2) just use this container to build your MFEs, then copy the compiled files to S3; don’t use the container for deployment at all. Either way, it’s much simpler than dealing with dozens of separate containers (I think).

Thoughts?

CC @xitij2000 @arbrandes @djoy @regis

6 Likes

Hey Braden! We are thinking along the same lines. I’ve reached almost the same conclusions this week. In particular, I want to centralize all MFE assets in a single image/container/volume, which will eventually be served by a web server.

Where my approach diverges is on the time necessary to build the assets, something which you summarized as:

I think that the cumulative MFE build time is prohibitive: in my experience, each MFE takes about a minute to build, which amounts to 10 minutes for 10 MFEs. The problem is that this build is necessary every time a configuration setting is modified. In the case of Tutor, I want to allow users to quickly change the platform language, url, plugins, etc. In practice, Tutor users frequently change their configuration, and I don’t want them to wait needlessly for their platform to be ready.

As a consequence, we need some kind of caching system for webpack. I looked at the native cache: {...} option and could not get it to work with Open edX’ version of webpack (AFAIU the cache option is simply ignored).

So what’s the best caching system left to us to build MFEs? I think it’s the Docker layer cache. Docker is really good at skipping unnecessary commands, such as:

COPY ./mfe.env ./env
RUN source env && npm run build

But that means that we need to build a Docker image. This is fairly easy to achieve locally (tutor images build mfe), but on a Kubernetes cluster we need some place to build the Docker image. What is often done in k8s-powered CI is to build Docker images from inside a Docker container. This is called Docker-in-Docker (“dind”) and there are different ways to do this. Read this blog post for a reliable summary.

This is how far I got on this topic. I did not yet experiment with all the dind approaches, but none of them are very satisfying so far.

3 Likes

@regis Glad to hear you’ve had similar ideas :slight_smile:

Yup, I agree that that’s the main issue here and should be a focus.

I don’t really understand your idea of using a docker layer cache, because I don’t see how it actually speeds up the webpack step, which must be re-run when config is changed. But I’m in favor of whatever works.

I think that there may still be low hanging fruit in fixing the slow builds directly, which is worth exploring. Admittedly I’m a webpack-hater and not very familiar with the MFE build pipelines, but from looking at the MFE webpack.prod.config.js, I see lots of things that look like they may not be optimized for speed:

  • It explicitly says “All settings here should prefer smaller, optimized bundles at the expense of a longer build time.”
  • Source maps are enabled. The webpack docs say, “Source maps are really expensive. Do you really need them?”
  • No use of cache-loader (This may be what you really need to get repeat builds going faster. Things like the image-related loaders probably don’t need to be re-run at all when config changes.)
  • No use of thread-loader which can speed up some loaders like sass compilation
  • The babel config seems to be using preset-env without a target (“We don’t recommend using preset-env this way because it doesn’t take advantage of its ability to target specific environments/versions.”)
    • setting a target that excludes IE11 might help by skipping a lot of ES5 transforms, and is not unreasonable given that major sites like Facebook don’t support IE11 and Microsoft has started forcibly redirecting users to Edge on some sites and announced that Microsoft 365 sites are dropping support for it on Aug. 17, 2021.
  • webpack-bundle-analyzer is enabled to generate stats about bundle size, but for our purposes here, this only slows down the build and provides no value.
  • And another big one is that Webpack 5 promises much better performance, but the upgrade is probably a nightmare.

Of course, I’m just taking a look now; it’s possible that these things have been tried before and found not to work, and I’m missing that context. But it still seems worth investigating.

2 Likes

Huge +1 from me for this. There is, however, one con I would add to the list:

  • Running a custom MFE (one not in the default list) would require either rebuilding the image or having it as separate container. The latter would be particularly troublesome if the new, expected way to run MFEs would be to use a monolithic image.

That said, I don’t think it’s a show-stopper. Just felt it should be made part of the discussion. Maybe we can make it so the list of built-in MFEs is configurable via some plugin engine. I’m guessing Tutor will be pretty good at doing this from the start. :slight_smile:

3 Likes

That’s because what I said is not completely true :slight_smile: in fact, we need to rebuild the MFEs every time that either one of the following is modified:

  1. MFE config environment variables
  2. Upstream repo/branch

Quite often, we make changes to the platform configuration without affecting any of these values – but we have to figure out whether we did, and to do so we need to put in place some sort of fingerprinting (think recursive folder sha1, or something like that). We don’t have to do this ourselves, because this is exactly the kind of caching that is efficiently performed by the Docker image building caching system.

And to address @arbrandes’ concern:

What I have in mind is either:

  1. Building different images for every MFE.
  2. Building a multi-layer image, with each independent layer corresponding to a different MFE.

In either case, we will not have to perform a full rebuild every time we update an MFE.

I understand this is all a bit abstract. I hope I’ll be able to make the approach more understandable once I come up with a working proof-of-concept.

From what I saw, not all configuration variables require a rebuild. I was trying to find time to experiment with moving all config variables to a single file that is separate, such that rebuilding after a config change will only change that one file which can potentially be loaded separately.

Mainly I came to this when thinking of ways to optimise theme building, which I was also hoping to optimise in a similar way.

@braden whatever the final approach ends up being, we will almost certainly have to rebuild the MFE Docker image quite frequently. I’m wondering what are your plans at OpenCraft for setting up an image building pipeline? Are you planning to host a registry and image builder in Kubernetes itself (with Kaniko for instance)? If yes, are you planning to host a separate registry for every Open edX platform, or a shared registry between all users?

I’m asking because I’m struggling to find a proper solution for building the MFE image with Tutor on Kubernetes.

@regis What I proposed in this thread is a short-term solution given what we have to work with for now. I don’t have a good answer to those questions you have since I was hoping we’d take a different approach for future releases to really enable frequent rebuilds. (If I need a private container registry, I’d likely just use GitLab’s container registry since we’re already using GitLab CI + GitLab Terraform backend; I know that’s not a general solution suitable for Tutor though.)

What I really want to see in the long term is outlined in this PR comment: https://github.com/edx/open-edx-proposals/pull/164#issuecomment-718126777 - the short version is that I don’t think a complex theming pipeline should be built into Tutor; rather the core platform itself should have a “Theming MFE” that controls theming, and allows rapid customization and preview (WYSIWYG).


P.S. based on things I’ve learned recently, switching from Webpack to Snowpack or Vite would likely unlock much faster frontend builds.

That’s sound like an intresteing to jump in.

I am leaning toward @braden propose (2): quote.

That combined with the using of AWS spot instnaces instance. The idea is just to use an instance with high computation/RAM to speed up the build process. And because that container doesn’t have to run every time, one could take advantage of the spot instance, which I guess is a perfect match for this scenario.

So in summary the flow would be something like this:

  1. Build triggered event (can be either by GitHub action or manually using K8S dashboard…etc)
  2. The build send the request to a Lambda function along with which frontend app need to be rebuilt
  3. The lambda function starts the “The builder spot instance” and pass the trigger event variable to it.
  4. The builder container/spot instance builds it and create an S3 object, and then pass the S3 object id to lambda back.
  5. The lambda function updates the frontend, for example, if it’s deployed on CDN, it can use Route53 to relink the frontend service to the new S3 object created in step 4.
  6. I guess if all is good, the builder should shut down, and the previously generated copy can be still saved for revert purposes, (however, we can delete the S3 object created at [t-2]).

Please think about people that don’t use AWS at all, by not introducing AWS-only processes.

1 Like

Fair enough, however all those services I mentioned are pretty general and have their alternatives in Azure or Google cloud,here they are explicitly

  • AWS spot instance: alternatives:: Preemptible VM (Google cloud) or Azure Spot
  • Route 53 alternatives: Google Cloud DNS or Azure DNS
  • Lambda function alternatives: Cloud functions (Google cloud) or Azure Functions Serverless Compute.

I just felt here the de facto cloud provider is AWS, since docs is exclusively reference to AWS, not to mention OEPs are being written because of something has to do with an AWS service.

We don’t want to be reliant on ANY cloud provider, we have our own hardware, will do co-location in a data center. And we want to keep it this way.

My apologies that it doesn’t apply to your case.

I absolutely think all MFE should somehow be bundled in one container, with a common node_modules folder. I now see with despair every Tutor update invoking “npm install” umpteen times. Maybe a completely separate build environment outside docker would be a solution?

Hmmmm this should not be the case. MFE need to be rebuilt (with npm run build) every time their configuration changes, by definition, but rebuilding should not trigger calls to npm install. Can you give more details on what exactly is going on? Maybe in a separate topic on the Tutor forums: https://discuss.overhang.io/

I’m outside now with my tablet, but I see every time there’s a new Tutor release, after downloading and performing “tutor local quickstart”, many times the node modules being downloaded and finally copied.