Learning Core Early Thoughts

dave · February 18, 2022, 12:28am

In “Attacking the Monolith by Extracting a Core”, I floated the idea of building a new repository for a Learning Core Platform. I’ve since added a roadmap item for it:

github.com/openedx/platform-roadmap

Learning Core Platform Arch Discovery

opened 05:35PM - 26 Nov 21 UTC

ormsbee

debt paydown

Extract core learning concepts and data models into a new `openedx-learning` rep…ository, with goal of creating a new, scalable core platform for learning innovation. # Goals 1. **Refactor the Open edX LMS to enable more dynamic behavior in a scalable way.** _New features like V2 Content Libraries and Effort Estimation need a more scalable implementation than what exists in courseware today._ 2. **Provide an easier, more reliable path for extensions developers.** _Extensions authors should be able to build and test against a smaller repo than edx-platform._ 3. **Accelerate monolith breakup.** _By delivering incremental improvements to extensibility and performance, we have more reason to work on monolith breakup than our previous "extract all the non-core things" approach, where the benefits were more back-loaded._ 4. **Improve separation of edX business concepts from the learning platform.** _No edX-specific logic would come over to the new repo._ 5. **Advance the decoupling of Studio and the LMS.** _Part of this work includes building a new core content data model for the LMS, which would replace the current shared model with Studio._ 6. **Create a lightweight foundation for entirely new learning experiences.** _This would allow groups like LabXchange to make use of helpful infrastructure and concepts, without having to simultaneously inherit all the technical debt and weight of edx-platform._ # Major Components The high level components would include: ## Composition What permutation of a single unit does a user see? This would handle things like A/B tests, randomized problem selection, disabling content by enrollment type, adding staff-only debug markup, etc. There should be multiple backends for what can render a Unit, with the XBlock runtime being one of those. ## Navigation Sequence and Unit metadata, outlines, etc. How do you get to a particular piece of content you need to learn from? This would pull in parts of the Learning Sequences API from `edx-platform`. ## Partitioning Low level utility that helps determine what users are in what groups for the purposes of A/B testing, enrollment tracks, etc. Used by both Composition and Navigation. ## Policy Content-related settings as they apply to the site as a whole, organizations, and individual courses. This covers a lot of what Course Overviews and course override waffle flags do today. ## Publishing Centralized list of Learning Contexts (e.g. Courses, Libraries), their published versions, and various content-related errors and warnings are associated with them. We need this to help us tackle the major issue we have today around publishing: it's an asynchronous process with many different components that sometimes take minutes to complete and may fail independently, leading to a mixed-published state. This is the most foundational component that others will be built on top of. ## Scheduling Lower level library for content scheduling information, likely pulling in most of what is `edx-when` today. ## Discovery/Design Phases - [x] https://github.com/openedx/openedx-learning/issues/1 - [x] https://github.com/openedx/openedx-learning/issues/4 - [x] https://github.com/openedx/openedx-learning/issues/5 - [ ] Create an MVP XBlock Unit renderer. # Full Implementation Phases - [ ] Phase 1: Publishing and Policy _Establish foundation and start experimenting with CourseOverview data commonly wanted by extensions._ - [ ] Phase 2: Partitioning and Composition _This would be useful for both upcoming V2 Content Libraries and Effort Estimation work._ - [ ] Phase 3: Scheduling _Much of this could be ported over from `edx-when`._ - [ ] Phase 4: Navigation _Much of this could be ported over from the `learning_sequences` API._ # Implementation Strategies The following are some high level approaches/considerations with this new project. ## Focus on content first. Content data that lives in Studio is easy to re-build and backfill into new apps. User data is up to five orders of magnitude larger, and involves much larger challenges in terms of data migration. ## Build Extensible Primitives LearningContexts are a generic term that applies to Courses, Content Libraries, Learning Pathways, and any number of other collections of content that we want to discretely version and publish. We can centrally define logic around these, while leaving it up to higher layers to model specific types of LearningContexts in a pluggable way. For instance, it makes sense for there to be a table of Courses, that have a foreign key to the LearningContexts table, which holds course-specific metadata. That table would have course-specific fields, and may even have a `null` learning context in the beginning (before any content is created). This kind of arrangement would lead to a three layered system: 1. Foundational primitives and logic, implemented only in `openedx-learning`. 2. Specific implementations of those components to define different learning experiences, e.g. "Two-level hierarchical navigation", "randomized user partitioning scheme". These could be implemented in `openedx-learning` or outside. The most popular and useful ones could get folded into the repo over time. 3. Business-level plugins that tie into systems defined in (2), such as `EnrollmentTrackOutlineProcessor`. These would be implemented outside the `openedx-learning` repo (many would live in `edx-platform`). ## Implement Plugins in `edx-platform` We once attempted to lift the ModuleStores out of `edx-platform` and ran into a rat's nest of dependencies that made the task extremely difficult. My thought with things like this is to have plugin interfaces that go the other way from what we usually do–where the core framework logic is in apps in this new repo, and the little plugin objects are created in edx-platform (and optionally elsewhere as well). So to use a concrete example, say we migrate the `learning_sequences` app in edx-platform today to become part of the navigation app in this new repo. The navigation app will then have the concept of OutlineProcessors–an object interface for different concerns that have to modify the set of things you can see or access in a course outline. The navigation app would have the logic for running OutlineProcessors, reading those values from a list defined in Django settings. The `EnrollmentOutlineProcessor` would be defined in edx-platform (and likely have imports to a bunch of things also in edx-platform), and then be specified in the settings file. By doing this, we can keep some of the crazy logic and dependencies in edx-platform–it would allow us to move things more incrementally, without taking huge risks. # See Also Original post: * https://discuss.openedx.org/t/attacking-the-monolith-by-extracting-a-core/5383

Goals

The goals, quoted from that roadmap ticket:

Refactor the Open edX LMS to enable more dynamic behavior in a scalable way.
New features like V2 Content Libraries and Effort Estimation need a more scalable implementation than what exists in courseware today.

Provide an easier, more reliable path for extensions developers.
Extensions authors should be able to build and test against a smaller repo than edx-platform.

Accelerate monolith breakup.
By delivering incremental improvements to extensibility and performance, we have more reason to work on monolith breakup than our previous “extract all the non-core things” approach, where the benefits were more back-loaded.

Improve separation of edX business concepts from the learning platform.
No edX-specific logic would come over to the new repo.

Advance the decoupling of Studio and the LMS.
Part of this work includes building a new core content data model for the LMS, which would replace the current shared model with Studio.

Create a lightweight foundation for entirely new learning experiences.
This would allow groups like LabXchange to make use of helpful infrastructure and concepts, without having to simultaneously inherit all the technical debt and weight of edx-platform.

Status

I’ve created an openedx-learning repo for this and checked in some very rough prototyping code. I’ve also got a lot of changes to the data model that I haven’t pushed up yet. It is absolutely not worth trying to play with and run at this point.

I also have a GitHub issue where I’m hashing over some content data modeling considerations, but again, this is very rough, and it reads much more as a stream-of-consciousness than a proposal at this point. I don’t think this is worth reviewing right now. I’m posting about this in the spirit of transparency, but the data model is a half baked mess and terminology for various concepts shifts several times over the course of the ticket. I probably need another few days of work to hammer it into a coherent proposal, and I’m on PTO next week. You are welcome to read and comment if you’re interested, but it may not be a great use of your time.

Request

The main things I’m looking for right now are:

Use cases around versioned content as it relates to unit composition (figuring out what things are in the unit for which people) and v2 content library usage.
Information about desired content groupings and structures, i.e. anything that would deviate from Open edX’s typical Section → Subsection → Unit → Module hierarchy.
Weird content edge cases.

Some specific examples of edge cases that I’m trying to better understand:

How important is the ability to change the UsageKey of a piece of content and not break state stored against that content (e.g. completion, score, XBlock user state)?
Aside from ProblemBuilder, are there non-edx-platform XBlocks that logically nest other XBlocks inside themselves below the Unit level?
Are Units conceptually flat lists without nesting? We currently have XBlocks like SplitTest that will nest entire units within units, but that serves mostly as a switching mechanism so that the block can change what gets displayed. From the user’s point of view, it’s still a flat list.
Does anyone use unit or module-level start/due dates?

The initial design of the courseware sidestepped a lot of these data modeling questions by reading everything into memory and defining really general interfaces. This was great for prototyping, but has caused us a lot of grief around performance, reliability, introspection, and complexity, causing us to add layer after layer of patchwork fixes over the years.

Any new Learning Core data model needs to accommodate our existing use cases, as well as use cases around new work in progress (like v2 content libraries). My hope is that we can simplify this requirement by dropping some of the extra power that I don’t think anyone actually uses, like being able to set due dates at any level of the hierarchy or being able to arbitrarily nest units (or nest sequences in units). But it’s kind of a wild world of content out there, and I want to check some of my assumptions on what pieces of functionality are still valuable.

Comments in this thread would be greatly appreciated. Thanks folks.

Topic		Replies	Views
Attacking the Monolith by Extracting a Core Architecture	12	1469	January 26, 2022
Future of Learner Dashboard architecture Architecture	16	625	June 9, 2023
Deprecation/Removal: Blockstore Deprecation edx-platform , blockstore , learning-core , redwood	0	180	February 1, 2024
Open edX as a (language-learning) mashup container? Educators	3	1119	October 19, 2019
When to make a new backend service? Architecture	8	725	September 27, 2022

Learning Core Early Thoughts

Goals

Status

Request

Related topics