Learning Core Early Thoughts

In “Attacking the Monolith by Extracting a Core”, I floated the idea of building a new repository for a Learning Core Platform. I’ve since added a roadmap item for it:


The goals, quoted from that roadmap ticket:

  1. Refactor the Open edX LMS to enable more dynamic behavior in a scalable way.
    New features like V2 Content Libraries and Effort Estimation need a more scalable implementation than what exists in courseware today.
  2. Provide an easier, more reliable path for extensions developers.
    Extensions authors should be able to build and test against a smaller repo than edx-platform.
  3. Accelerate monolith breakup.
    By delivering incremental improvements to extensibility and performance, we have more reason to work on monolith breakup than our previous “extract all the non-core things” approach, where the benefits were more back-loaded.
  4. Improve separation of edX business concepts from the learning platform.
    No edX-specific logic would come over to the new repo.
  5. Advance the decoupling of Studio and the LMS.
    Part of this work includes building a new core content data model for the LMS, which would replace the current shared model with Studio.
  6. Create a lightweight foundation for entirely new learning experiences.
    This would allow groups like LabXchange to make use of helpful infrastructure and concepts, without having to simultaneously inherit all the technical debt and weight of edx-platform.


I’ve created an openedx-learning repo for this and checked in some very rough prototyping code. I’ve also got a lot of changes to the data model that I haven’t pushed up yet. It is absolutely not worth trying to play with and run at this point.

I also have a GitHub issue where I’m hashing over some content data modeling considerations, but again, this is very rough, and it reads much more as a stream-of-consciousness than a proposal at this point. I don’t think this is worth reviewing right now. I’m posting about this in the spirit of transparency, but the data model is a half baked mess and terminology for various concepts shifts several times over the course of the ticket. I probably need another few days of work to hammer it into a coherent proposal, and I’m on PTO next week. You are welcome to read and comment if you’re interested, but it may not be a great use of your time.


The main things I’m looking for right now are:

  1. Use cases around versioned content as it relates to unit composition (figuring out what things are in the unit for which people) and v2 content library usage.
  2. Information about desired content groupings and structures, i.e. anything that would deviate from Open edX’s typical Section → Subsection → Unit → Module hierarchy.
  3. Weird content edge cases.

Some specific examples of edge cases that I’m trying to better understand:

  • How important is the ability to change the UsageKey of a piece of content and not break state stored against that content (e.g. completion, score, XBlock user state)?
  • Aside from ProblemBuilder, are there non-edx-platform XBlocks that logically nest other XBlocks inside themselves below the Unit level?
  • Are Units conceptually flat lists without nesting? We currently have XBlocks like SplitTest that will nest entire units within units, but that serves mostly as a switching mechanism so that the block can change what gets displayed. From the user’s point of view, it’s still a flat list.
  • Does anyone use unit or module-level start/due dates?

The initial design of the courseware sidestepped a lot of these data modeling questions by reading everything into memory and defining really general interfaces. This was great for prototyping, but has caused us a lot of grief around performance, reliability, introspection, and complexity, causing us to add layer after layer of patchwork fixes over the years.

Any new Learning Core data model needs to accommodate our existing use cases, as well as use cases around new work in progress (like v2 content libraries). My hope is that we can simplify this requirement by dropping some of the extra power that I don’t think anyone actually uses, like being able to set due dates at any level of the hierarchy or being able to arbitrarily nest units (or nest sequences in units). But it’s kind of a wild world of content out there, and I want to check some of my assumptions on what pieces of functionality are still valuable.

Comments in this thread would be greatly appreciated. Thanks folks.