Tutor Enhancement Proposal (TEP): Improve support for build caching

Introduction

Working on Open edX in a constrained environment is challenging. As one struggles to develop quality features and deliver them quickly, It gets annoying to wait for repeating actions to complete.

Goals

My goal through this proposal is to reduce Tutor’s image build cost by improving support for local caching of downloaded packages, images, etc. When this is done, it’s possible to:

  • save download time.
  • save bandwidth usage.
  • provide simple access to private/unreleased resources.

I’m sharing this work-in-progress in hope of:

  • someone to find it useful for them.
  • getting enough feedback to improve the implementation quality.
  • getting the idea itself to be approved as a tutor enhancement. Since the changes implied by this cannot be exported as plugins, this is a “take it or leave it” situation.

The Total Cache Journey

1. What is downloaded ?

First, I’ve explored Dockerfile templates present in Tutor and Tutor plugins, then made below a (non-exhaustive) list of the downloadables I’ve found so far:

  • docker base images.
  • apt packages.
  • pip requirements.
  • npm packages.
  • git repositories (this was covered in majority thanks to the DevExp project to Convert GitHub dependencies to PyPI dependencies)
  • pyenv downloaded binaries.
  • nodeenv downloaded binaries.
  • ruby gems (forum only).
  • gradle artifacts (android only)
  • translation files (openedx-i18n)
  • dockerize.
  • specific files/patches.
    Feel free to add anything I’ve missed.

2. Pull-Through Cache configuration

By using PTC (Pull-Through Cache) proxies, we can cover most of the downloads above.
Here is what I’ve achieved so far and I hope anyone with the knowledge can help improve coverage:

  • :warning: docker: run the official docker registry image and configure your host’s docker daemon as registry mirror in your local machine. However, this configuration doesn’t allow to work completey offline offline.

  • :white_check_mark: apt: run an apt-cacher-ng server image to cache the result of apt install and apt update

  • :white_check_mark: pip: run a pypicloud image, and configure it to cache pip requirements.

  • :white_check_mark: npm: run a verdaccio image, and configure it to cache npm packages.

  • :stop_sign: ruby gems: I tried to run a docker image of gemstash, but I didn’t succeed yet in building.

I’ve grouped all the working PTC proxies in a single repository named (you guessed it) ptc-proxies. It’s still a work in progress, but it’s ready to use as is in a local environment.

3. Benefit from PTC proxies

You can benefit from PTC proxies caching capabilities when running tutor images build or tutor dev/local start with some additional configuration and code modifications.

3.1 Configurations

I’ve added default configurations to Tutor’s codebase in tutor/templates/config/defaults.yml

  • apt: added APT_PROXY_URL set to null.
  • pip: added PIP_INDEX_URL set to https://pypi.org/simple
  • :white_check_mark: npm: Tutor and Tutor plugins already define the NPM_REGISTRY configuration set to https://registry.npmjs.org/. However, not all plugins used this configuraiton at build time. Now the support is complete with these 3 PRs merged to tutor-ecommerce, tutor-mfe and tutor-discovery.

3.2 Code changes:

a. Build options:
When the configurations above are set to local domains or ip addresses, docker will require one of the following:

  • adding custom domains with docker build --add-host option.
  • adding a custom network(s) where the proxies reside.
  • use docker build --network host option to provide docker build with access to the host machine’s network.

I chose the host network method which requires less configuration. Please note that host networking is used only during the build process so it doesn’t compromise the security of running containers.
The change is introduced at the tutor images build command and in the environment’s ‘docker-compose.yml’ under every service’s ‘build’ section.

The only drawback I’ve found is that custom networking, custom hosts, or custom dns, all affect docker’s way of generating cache keys thus leading to cache duplication. In other words, if you have an existing build cache then modify either of the options above, your next build will start from scratch even if it generates the exact same image layers.

b. Dockerfile template additions:
We wrap package download instructions with our PTC proxy definition (i.e we define a proxy only when it’s needed, and undefine it when the job is done). This helps limit cache duplication.

Example for apt

# Begin proxy
{% if APT_PROXY_URL -%}
RUN echo 'Acquire::{{ APT_PROXY_URL | url_part('scheme') }}::Proxy "{{ APT_PROXY_URL }}";' \
    > /etc/apt/apt.conf.d/00proxy 
{%- endif %}

# apt instructions
RUN apt update && \
    apt install -y build-essential curl git language-pack-en

# End proxy
{% if APT_PROXY_URL -%}
RUN rm /etc/apt/apt.conf.d/00proxy
{%- endif %}

Example for pip

# Begin proxy
ENV PIP_INDEX_URL={{ PIP_INDEX_URL }} \
    PIP_TRUSTED_HOST={{ PIP_INDEX_URL | url_part('netloc') }}

# pip instructions
RUN pip install setuptools==65.5.1 pip==22.3.1. wheel==0.38.4
RUN pip install -r /tmp/base.txt
RUN pip install django-redis==5.2.0
RUN pip install uwsgi==2.0.21

# End proxy
ENV PIP_INDEX_URL= \
    PIP_TRUSTED_HOST=

BTW, url_part is a custom jinja template filter to get a specific part of a url (e.g. protocol, domain, etc).

Quick Notes

  • For apt, we could have used HTTP_PROXY or HTTPS_PROXY environment variables, but we avoided them since they are also detected by pip.
  • This is a working example. I’ve tried using docker ARG for pip and it didn’t work for me. I’ve also tried using files from context but that was too bulky. I believe there are much cleaner ways to achive the same resutls.

Conclusion

That was my own story of endless struggle with tutor builds.
Please feel free to:

I’m expecting feedback from our @tutor-maintainers, fellow #tutor users, #working-groups:dev-experience members and anyone interested.

4 Likes

Hi @ARMBouhali

Feel free to add this to the next BTR WG agenda if you want to talk about it for a few minutes.

https://openedx.atlassian.net/wiki/spaces/COMM/pages/3620012033/2023-01-09+BTR+Meeting+notes

Hey @ARMBouhali, this is very cool. I’ve attempted multiple times to leverage caching during build but never went as far as you.

Now, you know what I’m goind to say, right? :stuck_out_tongue: This is great, and I want it. But can we just make it into a plugin?

Let me clarify:

  1. I think it makes sense to add the APT_PROXY_URL and PIP_INDEX_URL setting to Tutor core.
  2. The core Tutor images along with official plugins should strive to make use of these new settings when they are defined, thus leveraging cache. This means that we would add the changes to the Dockerfile templates from your branch (feat: proof of concept for build caching · overhangio/tutor@94063b0 · GitHub).
  3. Deploying apt/npm/pypi caching services should not be handled by Tutor core, but by a plugin. If not, then at the very least there should be instructions on how to do it. Maybe these instructions will simply point users to the upstream project documentation.
  4. Users should always have the possibility to deploy caching services to a remote server. Thus docker build --network host should not be the default, even when caching services are enabled.

Additional notes:

  • It is unnecessary to add an extra option to tutor images build. Instead, we can write: tutor images build --docker-arg "--network=host" openedx.
  • The --network=host option may be automatically added to docker build by a Tutor plugin, based on the value of the BUILD_USE_HOST_NETWORK setting. This setting should be introduced by the same Tutor plugin.
  • What’s the use of RUN rm /etc/apt/apt.conf.d/00proxy?
  • url_part should be added to Tutor core (with maybe a different name)
3 Likes

@ARMBouhali I like your proposal a lot, and I agree with @regis that the new caching services would best be added as a plugin.

Devstack, for reference, uses DevPI as a PTC for pip requirements. I imagine it saved me a lot of time when I used Devstack, and I can’t recall any times it caused problems.

By the way, I also happen to be looking Tutor image caching improvements, but at a different layer from this one: I want to figure out why Docker isn’t using its cache of previously-pulled layers when building modified openedx images.

1 Like

@kmccormick Thanks for the information.
Actually, I wasn’t intending to build a Tutor plugin out of fennec-tech/ptc-proxies, since it already does the job without being part of the Tutor plugin family. But knowing Open edX Devstack does use DevPi for that purpose encourages me to make it a plugin if this would help close the gap between Tutor and Devstack.

@regis again very grateful for your thorough feedback.
I’ll be preparing a PR for the necessary changes to Tutor core to support the use of PTC proxies.

I agree with almost everything you said. So I’ll dedicate my reply to some notes and exceptions:


I agree. this can be dropped. It was just part of the exploration process.


I’m interested in knowing how to add this through a plugin instead of editing tutor core code for the tutor images build command.
However, BUILD_USE_HOST_NETWORK should be defined in Tutor core to be used in docker-compose-{dev or all}-services template, responsible for building images using tutor <env> start which calls docker compose up. We can discuss this further when I post the PR in question.


This is the docker build instruction to opt-out of apt download caching.

  • NPM_REGISTRY is defined with an ARG instruction, thus not persisted after the build.
  • However pip and apt proxy configurations would survive the build if we do not ‘undo’ them with opt-out instructions. So I need to work a cleaner way that does apply proxy config only when it’s needed without further contamination of the build cache of the following Dockerfile instructions, or the built image itself.
  • I tried first adding configuration templates to the build context dir to have cleaner code but it was too much replicating this across every plugin.

I believe proxy configuration should be applied as much as possible in a fine-grained way (at each targetted build instruction), so it doesn’t add unnecessary docker build steps to the cache nor affects the consequent cache for the following instructions. For this, I have 2 options I’m not sure which would be the best:

  1. Either use a custom jinja function added to the instruction:
  2. Or do some kind of template post-processing which parses Dockerfile lines for search and replace. I’m not sure how to do that, though, and I doubt tutor support template post-render actions.
# Original
RUN pip install [...]
# With custom function (option 1)
RUN pip install [...] {{ with_proxy('pip') }}
# Rendered either with custom function above (option 1) or with post processing (option 2)
RUN pip install [...] --index-url=<your_proxy> --trusted-host=<your_proxy_domain>

This is really awesome work you have been doing @ARMBouhali.

Regarding caching, I’ve always wondered if there was a way to make use of the cache mount type, but it always seemed too intrusive to me, as it would require changing every RUN instruction that perform some kind of package management. Buf if the custom filter you are proposing were to be implemented would also be a good idea to include a mechanism to allow adding arbitrary options to each instruction?

1 Like