TL;DR - we need infrastructure for running two types of edx-platform CI builds and developer setups - one minimal with no plugins installed and one “batteries included”, and this will help improve the architecture.
Hey folks, something that was on my mind this morning:
Part 1: Django apps in separate repos don’t help
Somehow I missed this, but last year there was a nice ADR “Guidelines for new edx-platform applications” from @Rebecca_S_Graber which basically says “try to build new django apps in a separate repository, not within edx-platform, unless they absolutely need to import code from edx-platform, or they provide core functionality.”
I think that this ADR doesn’t address a big issue though, which is this: regardless of whether a django app is located in edx-platform or in a separate repository, if it’s used in the edx-platform codebase, it becomes “part of the problem” - it gets installed by default along with all of its dependencies, and it essentially becomes part of the monolith.
For a case in point, look at edx-enterprise. Certainly it’s something that’s non-core and that most installations of Open edX are not using. It’s been developed in a separate repository, which is nice. But it’s installed by default and - importantly - today you cannot run edx-platform without having it installed. There are direct imports (
from enterprise.models ...,
from enterprise.constants ...) in core djangoapps like
user_api, and so on). So at this point, it is actually part of the monolith; it’s just distributed in a separate repository. In this case I would argue the separate repo actually means more work for developers without any benefit.
I’m kind of a monorepo fan, since you typically use one PR per feature instead of needing to coordinate many PRs across different repos for things like enterprise functionality, but I’m not here to argue for a monorepo. I just want to point out that moving code into a separate repository doesn’t inherently make the architecture any better and in my personal opinion actually makes it worse (in cases like enterprise where the code isn’t re-used in any other projects).
What is a clear win on the other hand would be to keep the set of installed django apps, libraries, and plugins required for the platform to boot up and run to an absolute minimum. Take edx-enterprise - it’s good that it’s built in as a django app plugin, but it never should have been added to
base.in because that opened the door to making edx-platform depend on enterprise, and now you can’t choose to uninstall it even if you don’t want it, even though it’s nominally a “plugin”.
So: as far as I can tell, edx-enterprise is in complete compliance with the ADR, yet in retrospect I don’t think it’s been done “the right way”.
Part 2: Python packages required but not used
Now, hold that thought while I bring up a second point. @regis and @kmccormick have been leading the charge on reducing platform build times (see e.g. Is building Docker images taking too long for you? Convert GitHub dependencies to PyPI dependencies wherever possible · Issue #153 · openedx/wg-developer-experience · GitHub etc.), and one contributor is the huge number of dependencies that edx-platform has. So I started reviewing the python dependencies in base.in, and a few things jumped out at me:
algoliasearch # Algolia’s API client, used for "Learner Recommendations" analytics-python # Used for Segment analytics edx-braze-client # "Customer Engagement Platform" edx-enterprise # Functionality for B2B course sales optimizely-sdk # Optimizely full stack SDK for Python py2neo # Neo4j graph DB driver used for Coursegraph
^ These all seems like things that are only used by edx.org or perhaps a handful of instances, not core functionality.
So what, you might think? 6 packages is not a big deal out of the big list of dependencies. But if you look at those 6 packages (ignoring the XBlocks for now), you can see that they add quite a few dependencies in turn:
backoff==1.10.0 # via analytics-python monotonic==1.6 # via analytics-python + py2neo edx-rbac==1.7.0 # via edx-enterprise edx-tincan-py35==1.0.0 # via edx-enterprise jsondiff==2.0.0 # via edx-enterprise pgpy==0.6.0 # via edx-enterprise snowflake-connector-python==3.0.0 # via edx-enterprise testfixtures==7.1.0 # via edx-enterprise django-cache-memoize==0.1.10 # via edx-enterprise django-multi-email-field==0.6.2 # via edx-enterprise django-object-actions==4.1.0 # via edx-enterprise djangorestframework-xml==2.0.0 # via edx-enterprise jsonschema==4.17.3 # via optimizely-sdk pyopenssl==22.0.0 # via optimizely-sdk + snowflake-connector-python (edx-enterprise) pyrsistent==0.19.3 # via jsonschema (optimizely-sdk) + optimizely-sdk pansi==2020.7.3 # via py2neo interchange==2021.0.4 # via py2neo importlib-resources==5.12.0 # via jsonschema (optimizely-sdk) pkgutil-resolve-name==1.3.10 # via jsonschema (optimizely-sdk) aniso8601==9.0.1 # via edx-tincan-py35
And that’s not even the exhaustive dependency tree, just the ones that I could quickly spot that wouldn’t be needed if those 6 packages like
algoliasearch weren’t installed by default.
There are also an bunch of XBlocks are not considered “core” yet are installed by default:
crowdsourcehinter-xblock recommender-xblock # https://github.com/edx/RecommenderXBlock done-xblock xblock-google-drive # XBlock for google docs and calendar xblock-poll # Xblock for polling users
Part 3: What to do
Now, of course there are significant reasons why we’re in the current situation today. Some that I can think of in particular are:
- the hooks/events framework and the event bus did not exist in the past, and these are critical tools for de-coupling apps from edx-platform
- A truly minimal default configuration of Open edX is a pain for developers - you have to do tons of setup before you can actually work on features.
But I think another part of the reason is that we don’t have infrastructure for a smaller core. If you develop something in edx-platform (e.g. the “Learner Recommendations” feature), you have two options: either (1) it’s not installed by default (so you cannot include proper tests for it in edx-platform, so other teams might change edx-platform in ways that break your feature), or (2) you install it by default, you get test coverage, and you just have to accept adding libraries like
algoliasearch to the platform’s core requirements, even though it’s not really core. The build is slower for everyone.
(Side note: @dave has a great plan to address this via Learning Core - , ,  but it’s a long-term project.)
I want to propose a change to our CI and Tutor (/devstack/etc.) infrastructure: there should be a “minimal” build and a “default” build.
With the “minimal” build, every single plugin and feature that’s not core should be uninstalled - no enterprise, no xblock-poll, no ORA2, no Segment/Optimizely/Braze integrations, no proctoring, no third party auth plugins, no Name Affirmation, no discussions plugins, etc. Not just disabled but fully uninstalled, to ensure nobody is importing from them in core code.
With the “default” build, all of those things should be installed. On devstack they should perhaps even be configured and working by default; on CI they should be enabled on a test-by-test basis.
In CI, platform tests should be run against both builds; the “default” build will include more tests because the installed plugins will install their own tests too.
The minimal build serves several purposes:
- It distinguishes in code what’s “truly” core vs. what’s not “truly” core (or what should be)
- It ensures that the platform runs and all the core tests pass, when every plugin is turned off - preventing accidental creep of imports from plugins or third party code into the core.
- It allows writing tests that run on CI for every edx-platform PR but which depend on plugins or third party code, without having those dependencies added to the “true” core.
- It may allow a lightweight docker base layer that “default” or “custom” layers can be built on top off more efficiently than the current layers.
And the default build is all about Developer Experience and API stability. You can save tons of time when everything is ready to go out of the box, and you can feel better about stability when tests from all the major commonly used plugins are part of the core test suite.
Importantly, we could take one step toward this today, simply by splitting base.in into two files, e.g.
default.in. Some things like
crowdsourcehinter-xblock could probably be moved into default.in right away without any issues, while others like
enterprise will require a lot of work. But having those two files rather than one lets us document the problem, label things we want to move out of base.in, work on it step by step, and avoid regressions. (By comparison someone could do the work today to move a django app out of edx-platform to a separate repo but without a minimal CI build to enforce the separation, imports could creep back in, à la enterprise.) As another example, with such a setup, you could write a CI check that requires any additions to
base.in have an accompanying ADR.
Setting up the separate
minimal build for CI can also be done quite easily, but if the delta between minimal and default is small at first, it may be seen as a waste of resources. But my opinion is that it’s worth setting such infrastructure up so we have a way to see, measure, and fix the root problem.