New Directory Structure & Public Python APIs for openedx-platform

This is a proposal for a major restructuring of the openedx-platform (formerly edx-platform) repository. I’d originally posted it as a GitHub issue but I’m moving it here to broaden the discussion. Once we have some consensus, I can write this into an ADR and get started on implementation.

Outstanding issues

There is no defined Python API for openedx-platform. Yes, we do have other backend APIs already: openedx-events, openedx-filters, xblock, and soon openedx-learning (a.k.a. learning). These APIs will always be the most stable way to plug into the platform. However, the most advanced plugins will still need to be able to call core openedx-platform functions: enroll a user, create an content library, get the list of organizations, etc. In the current state of affairs, plugins will call those core functions anyway, and then frequently get broken as we develop openedx-platform.

The split between LMS and CMS is more trouble than it is worth. The platform contains bespoke tooling to handle the fact that it is two applications wrapped into one: there is our special manage.py file, the two separate celery configs, the two parallel settings trees, and so on. Each of these has caused a difficult-to-debug production issues at some point. And although we get some architectural benefit from separating LMS (learning) and CMS (authoring) domains, we find that real-world apps usually overlap between these domains, so we end up either with most apps being “shared” between the two sides, or being awkwardly split into 2-3 pieces (see discussions, discussions, and discussions :exploding_head:).

Prior art: how openedx-learning solves this

The Learning Core, a.k.a. openedx-learning, organizes all of its stable Python APIs into modules within openedx_learning.api (example usage). Those API modules simply re-export functions which are defined within openedx_learning.apps. Everything within “apps” is an implementation detail–we maintainers are free to refactor “apps”, as long as we keep “api” stable.

We’re able to maintain good architectural boundaries between the “apps” well orangized by using importlinter rules . The rules specify a layering order, so that low-level apps like “publishing” are not allowed to become dependent on higher-level apps like “backup_restore”. (Note: we’ve begun trying to do this in openedx-platform already, too).

I think this has been working very well, so I propose we adopt something similar for openedx-platform.

Proposed solution for openedx-platform

  1. Create a new directory tree:

    • ./openedx_platform/
      • ./openedx_platform/api/All public Python APIs. We avoid breaking changes to these APIs, and use the DEPR process & a major version bump when they’re unavoidable. There would be no implementations in this folder, just imports and re-exports of functions from apps.
        • The APIs would be organized into sub-modules, like:
          • ./openedx_platform/api/content_libraries.py
          • ./openedx_platform/api/xblock.py
          • ./openedx_platform/api/rbac.py
        • and certain database models would be allowlisted on a case-by-case basis, e.g.:
          • ./openedx_platform/api/content_libraries_models.py
          • ./openedx_platform/api/xblock_models.py
          • ./openedx_platform/api/rbac_models.py
    • ./openedx_platform/apps/ – Future home for all implementations, including models definitions. We would use importlinter’s layering and independence rules to ensure good architectural relationships between different apps.
    • ./openedx_platform/settings/ – Future home for Django settings.
  2. Begin building out ./openedx_platform/api/, starting with Python APIs that we feel are already near-stable, such as content_libraries.

  3. Publish openedx-platform to PyPI, starting at v1.0.0.

  4. Announce that external imports to all other paths (./lms/, ./cms/, ./openedx/, ./common/, and ./xmodule/) are deprecated. Plugins relying on these imports should move to public Python APIs. In tandem, we will encourage development of new stable openedx_platform.api.

  5. After some grace period, begin moving code from the old implementation directories (./lms/, ./cms/, ./openedx/, ./common/, and ./xmodule/) into the new one (./openedx_platform/apps).

What do you think?

This would have both large benefits (more stable APIs, and a more organized monolith), but also a lot of disruption (breaking changes to existing plugins, very painful rebases for forks once we start moving code). I believe the benefits outweigh the intermediate pain, but I want to hear from all of you. Let us know your thoughts below!

4 Likes

Open questions / prior discussion

REST APIs

@braden asked:

Where does the implementation of REST APIs fall under this scenario? I would love if we can move to a model where the REST APIs generally mirror the python APIs 1:1. That’s easiest to do when the REST APIs are only allowed to import the public APIs, rather than living in the apps code and importing internal APIs and models. Ideally we could use django-ninja instead of DRF to make this much nicer and faster."

Data objects

A nice new pattern we’ve seen is defining immutable datatypes in a module called data.py. These datatypes can be returned from Python APIs as an alternative to returning model instances, which leak implementation details to API consumers.

I think these datatypes should be exposed from openedx_learning.api as well, but I’m not sure exactly where.

importlinter

@braden asked, regarding importlinter layers:

 layers = 
    # Layers from high-level to low-level. Imports should only occur from higher to lower. 
    cms.lib.xblock.upstream_sync | openedx.core.djangoapps.content.search | openedx.core.djangoapps.olx_rest_api 
    openedx.core.djangoapps.content_libraries 
    openedx.core.djangoapps.content_staging 
    openedx.core.djangoapps.xblock 
    openedx.core.lib.xblock_serializer 
    openedx.core.djangoapps.content_tagging 

I really like this approach, but I wish that there was a way to make it more obvious which “level” each app is at, other than referring to the linter config file.

For example, how would a dev know that “content_libraries” can depend on “content_staging” but not the other way around?

LMS/CMS

I asked: Will we still want to support running the platform in “LMS mode” and “CMS mode”? If so what’s the right way to support this? What benefits from the LMS/CMS split do we want to retain (e.g. different scaling needs for Studio vs LMS)?

@braden’s response:

Will we though? The code would likely get significantly cleaner and simpler over time if we just combine them into one big headless REST API server. It might be a more efficient use of memory and CPU too.

Otherwise, if we’re going to keep the split, I would actually prefer to see more separation between CMS and LMS, rather than this “mode” toggle on the shared app.

[Regarding scaling needs] We scale that way today because it’s the only option, but I suspect it would be much better if we could, for example, scale up a few lightweight servers that handle only the problem_check and “get unit” endpoints (e.g. during exams). Focusing in on a few hot endpoints and separating them out would allow those processes to use much less memory compared to workers that load into memory anything used by the entire LMS.

We could also work to move some of the slower API endpoints to use asynchronous Django for I/O calls (e.g. the database or the discussions service), which I suspect would be a much bigger win for scalability than the LMS+CMS split ever was.

@dave had concerns about Step 5 of my proposal (the part where we actually break the old imports to lms, cms, openedx, common, and xmodule). We agreed the benefits of Steps 1-4 (the creation of the public APIs) can all happen without actually ever doing Step 5. This would leave the internal openedx-platform implelmentation in a messier end state, but it would avoid breaking existing plugins.

I’m not sure where I stand on this yet–but, I do agree that we would at least avoid breaking old import paths unles there has been a suitable Python API available for at least one named release. That way, plugins would have a chance to migrate before being broken.

@feanil you had some thoughts on DRF vs django-ninja, do you mind adding them here?

@dave also asked about maintenance boundaries. Would openedx_platform.api in its entirety be maintained by one person / team?

My thought is that each sub-API (openedx_platform.api.content_libraries, openedx_platform.api.enrollment, etc.) would be maintained by one person or team. That way, each sub-API will be curated and cared for as a an atomic unit, but the we are able to distribute the load across multiple contributors.

A big +1 from me. My 2c is that this split is a big pain point both from a development perspective and an admin perspective.

As a developer, it’s all one codebase, but I can’t treat it like so because some things only work when running as the LMS or on the CMS, and it’s never clear why it shouldn’t. For example, a plugin I’m working on this week requires hooking into LMS functionality (user registration) and using CMS functionality (the CourseCreator model). Then I’m stuck with a traceback:

RuntimeError: Model class cms.djangoapps.course_creators.models.CourseCreator doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS.

And now I have to figure out how to get this plugin to run on the cms only while hooking into user registration, and I’m thinking about celery tasks, etc. and it’s so much more complicated.

As an admin, it’s all the same app, but I need to decide whether to run management commands on the LMS or the CMS backend - some work on both, while others only work on one. Likewise for checking logs. I would love to be able to have a single “backend” service running to make all that simpler.

Granted, these perspectives ignore benefits of the split; I’m sure there have been benefits and a good rationale for the original decision for the split. I’m wondering though with the move to MFEs and edx-platform moving toward more of a backend rest api, if it’s time to merge the split.

:slight_smile:

2 Likes