When to make a new backend service?

braden · September 14, 2022, 7:55pm

Thanks for listing those reasons; I never really understood until now why it was made as a separate service.

Re your question, the main thing I personally think about is the flow of data.

Looking at the discovery service documentation, here is its explanation:

The distribution of edX’s data has grown over time. Any given feature on edx.org may need information from Studio, the LMS, the Ecommerce service, and/or the Drupal marketing site. Discovery is a data aggregator whose job is to collect, consolidate, and provide access to information from these services.

At one level, that seems very reasonable. And if the data flow were like this:

Untitled drawing

I think it would be great.

But as I understand it, it’s really more like this in practice:

Untitled drawing (1)

For example, to define a program, you have to go into the Django admin of the discovery service, so it’s the original source of truth. Then the program is available via the Discovery Catalog API (which may or may not use the elasticsearch index depending on how you call it) and the LMS actually calls that API and caches it and makes it available within the LMS.

This a complicated flow of data, with several places where things can get out of sync. It’s also a bit difficult to debug because if you’re having some data issue, in a worst case scenario you may have to compare the LMS modulestore, the LMS Course Metadata table, the Discovery service MySQL table, Discovery ElasticSearch cache, and perhaps even LMS discovery cache to figure out what’s wrong.

From a user perspective, this can also be a bit strange. If you think of “Programs” as just a way of grouping together sets of courses and selling the set a discount, and that’s all managed by a Sales team that in unrelated to the course authoring teams, this makes sense. But if you’re an educator thinking of “Programs” as a way to structurally group courses and micro-courses in a way to optimize learning for different target audiences, you’d expect to be able to edit programs in Studio and view program enrollment in the same manner as you view course enrollment. It doesn’t make sense that to create a new course run you use Studio but to create a new course group (Program) you have to manually enter the URL of some entirely different website and use the admin backend to do so.

So while I feel like I understand why things are as they are today, and it’s not unreasonable, if it were me doing it today I would consider it a core responsibility to have a pluggable API for maintaining metadata about all courses, programs, mini-courses, libraries, etc., and which lives in the LMS and mostly uses foreign keys to ensure data integrity and never be out of date. But I would probably try not to include a search index as part of this core, and have a separate application which pulls data from the core metadata API and stores it in elasticsearch or typesense, to serve end-user searches only.

This is because python doesn’t have a good way to indicate a public API vs. private API. But there is a nice solution which can reduce the impact of this issue quite a bit. If you have any interface that makes sense to implement using plugins, you can design the plugin API so that you can’t really call any functions on the plugin directly, but only through a central plugin manager (because you won’t be able to instantiate the plugin and all of its methods are instance methods). An example of this in the platform is that you can’t call the split modulestore API directly from anywhere in the LMS or Studio code, but rather you have to call from xmodule.modulestore.django import modulestore and then use its public API which will in turn call split. We don’t have any problems with people calling into the split code directly because, well, you can’t.

Another alternative which is used in edx-platform is the use of api.py files, although this is less effective.

This is fundamentally a design limitation of python, which some other languages like Rust and JS do not have. It could be especially problematic in this case if e.g. the LMS and Discovery were using different versions of the ElasticSearch driver. So I think this can be a compelling reason to create a separate service, though it also helps to find ways to keep dependencies to an absolute minimum in either case imho.

I also think that your learning-core project will solve this, by creating a lean, mean core with few dependencies and with thoughtful boundaries around the data.

Topic		Replies	Views
Future of Learner Dashboard architecture Architecture	16	718	June 9, 2023
Attacking the Monolith by Extracting a Core Architecture	12	1645	January 26, 2022
Auto-suggest course content on search (Meilisearch-compatible) Development	26	715	August 8, 2024
Discourse integration plugin for Open edX Collaborative Proposals	25	4817	August 20, 2021
The future of Open edX Authentication Architecture	23	1492	October 20, 2025

When to make a new backend service?

Related topics