Introduction
Working on Open edX in a constrained environment is challenging. As one struggles to develop quality features and deliver them quickly, It gets annoying to wait for repeating actions to complete.
Goals
My goal through this proposal is to reduce Tutor’s image build cost by improving support for local caching of downloaded packages, images, etc. When this is done, it’s possible to:
- save download time.
- save bandwidth usage.
- provide simple access to private/unreleased resources.
I’m sharing this work-in-progress in hope of:
- someone to find it useful for them.
- getting enough feedback to improve the implementation quality.
- getting the idea itself to be approved as a tutor enhancement. Since the changes implied by this cannot be exported as plugins, this is a “take it or leave it” situation.
The Total Cache Journey
1. What is downloaded ?
First, I’ve explored Dockerfile
templates present in Tutor and Tutor plugins, then made below a (non-exhaustive) list of the downloadables I’ve found so far:
- docker base images.
- apt packages.
- pip requirements.
- npm packages.
- git repositories (this was covered in majority thanks to the DevExp project to Convert GitHub dependencies to PyPI dependencies)
- pyenv downloaded binaries.
- nodeenv downloaded binaries.
- ruby gems (forum only).
- gradle artifacts (android only)
- translation files (openedx-i18n)
- dockerize.
-
specific files/patches.
Feel free to add anything I’ve missed.
2. Pull-Through Cache configuration
By using PTC (Pull-Through Cache) proxies, we can cover most of the downloads above.
Here is what I’ve achieved so far and I hope anyone with the knowledge can help improve coverage:
-
docker: run the official docker registry image and configure your host’s docker daemon as registry mirror in your local machine. However, this configuration doesn’t allow to work completey offline offline.
-
apt: run an apt-cacher-ng server image to cache the result of
apt install
andapt update
-
pip: run a pypicloud image, and configure it to cache pip requirements.
-
npm: run a verdaccio image, and configure it to cache npm packages.
-
ruby gems: I tried to run a docker image of gemstash, but I didn’t succeed yet in building.
I’ve grouped all the working PTC proxies in a single repository named (you guessed it) ptc-proxies
. It’s still a work in progress, but it’s ready to use as is in a local environment.
3. Benefit from PTC proxies
You can benefit from PTC proxies caching capabilities when running tutor images build
or tutor dev/local start
with some additional configuration and code modifications.
3.1 Configurations
I’ve added default configurations to Tutor’s codebase in tutor/templates/config/defaults.yml
-
apt: added
APT_PROXY_URL
set tonull
. -
pip: added
PIP_INDEX_URL
set tohttps://pypi.org/simple
-
npm: Tutor and Tutor plugins already define the
NPM_REGISTRY
configuration set tohttps://registry.npmjs.org/
. However, not all plugins used this configuraiton at build time. Now the support is complete with these 3 PRs merged to tutor-ecommerce, tutor-mfe and tutor-discovery.
3.2 Code changes:
a. Build options:
When the configurations above are set to local domains or ip addresses, docker will require one of the following:
- adding custom domains with
docker build --add-host
option. - adding a custom network(s) where the proxies reside.
- use
docker build --network host
option to providedocker build
with access to the host machine’s network.
I chose the host network method which requires less configuration. Please note that host networking is used only during the build process so it doesn’t compromise the security of running containers.
The change is introduced at the tutor images build
command and in the environment’s ‘docker-compose.yml’ under every service’s ‘build’ section.
The only drawback I’ve found is that custom networking, custom hosts, or custom dns, all affect docker’s way of generating cache keys thus leading to cache duplication. In other words, if you have an existing build cache then modify either of the options above, your next build will start from scratch even if it generates the exact same image layers.
b. Dockerfile template additions:
We wrap package download instructions with our PTC proxy definition (i.e we define a proxy only when it’s needed, and undefine it when the job is done). This helps limit cache duplication.
Example for apt
# Begin proxy
{% if APT_PROXY_URL -%}
RUN echo 'Acquire::{{ APT_PROXY_URL | url_part('scheme') }}::Proxy "{{ APT_PROXY_URL }}";' \
> /etc/apt/apt.conf.d/00proxy
{%- endif %}
# apt instructions
RUN apt update && \
apt install -y build-essential curl git language-pack-en
# End proxy
{% if APT_PROXY_URL -%}
RUN rm /etc/apt/apt.conf.d/00proxy
{%- endif %}
Example for pip
# Begin proxy
ENV PIP_INDEX_URL={{ PIP_INDEX_URL }} \
PIP_TRUSTED_HOST={{ PIP_INDEX_URL | url_part('netloc') }}
# pip instructions
RUN pip install setuptools==65.5.1 pip==22.3.1. wheel==0.38.4
RUN pip install -r /tmp/base.txt
RUN pip install django-redis==5.2.0
RUN pip install uwsgi==2.0.21
# End proxy
ENV PIP_INDEX_URL= \
PIP_TRUSTED_HOST=
BTW, url_part
is a custom jinja
template filter to get a specific part of a url (e.g. protocol, domain, etc).
Quick Notes
- For
apt
, we could have usedHTTP_PROXY
orHTTPS_PROXY
environment variables, but we avoided them since they are also detected bypip
. - This is a working example. I’ve tried using docker ARG for pip and it didn’t work for me. I’ve also tried using files from context but that was too bulky. I believe there are much cleaner ways to achive the same resutls.
Conclusion
That was my own story of endless struggle with tutor builds.
Please feel free to:
- express your opinion, and whether or not this improvement should find its place in Tutor and Tutor plugins.
- use the resources (ptc-proxies repo + the proof-of-concept tutor:tep-build-cache branch) for your own good.
I’m expecting feedback from our @tutor-maintainers, fellow #tutor users, #working-groups:dev-experience members and anyone interested.