Help please: Very slow load time (10 seconds) for courseware on sections with several subsections and xblocks!

dave · August 28, 2020, 2:12pm

Okay, I did some profiling of this last night. I’m doing a very quick/rough writeup before I start my work day because I’m further behind on my PR reviews than usual these past two weeks, and I don’t have time to do a more thorough analysis. There’s a lot to dig into here.

Reproduction Steps

If you want to check it out for yourself:

Use the perf_debug_slow_seq branch I put up with hacky profiling middleware.
Create a course and import the Slow Course into it. I named mine course-v1:Appsembler+OmarMattPerfTestCourse+2020-08-28
Create a stats folder in your edx-platform directory.
Load http://localhost:18000/courses/course-v1:Appsembler+OmarMattPerfTestCourse+2020-08-28/courseware/7a62b1fd2b2a4a22a351a6c6d55af2f1/b215a03a16fa4d1687d8921521fda1a5/ once without profiling to make sure any process-level startup overhead is accounted for.
Add profile_name=/edx/app/edxapp/edx-platform/stats/omar_seq1.stats as a querystring parameter to that URL (with the first Unit selected), and make the request.
Examine the newly genereated stats file with snakeviz in a separate virtualenv.
Or just look at the stats file I generated locally. omar_seq1.stats (541.5 KB)

Running on my local devstack:

This sequence takes 8+ seconds to load with the django debug toolbar enabled, on all requests after the first one (which is much longer).
It’s about 5 seconds with the debug toolbar disabled (which is why it’s disabled in my branch).

Analysis

Some big items stick out:

~31% of the time is spent in add_staff_markup, which is what builds all the debug information and UI that is visible only to course teams. In general, staff views will take longer to render than students (though there are some penalties in the other direction, particularly since being staff short circuits various access checks). There is some crazy old legacy stuff in there, and I did not have time to dive into the details.
~20% of the time is spent doing various template variable substitutions (replace_static_urls, replace_jump_to_id_urls, replace_course_urls) which boil down to a whole bunch of regular expressions sub calls.
~25% of the time is spent on edx-when’s fallback behavior, where it walks up the entire ancestry tree whenever you ask for an attribute value related to start/due dates. I believe that’s actually a backwards-compatibility feature so that edx-when doesn’t break when it’s encountering a course that has not been published since edx-when was created.
An underlying issue that is deep under the covers everywhere is that OpaqueKeys are actually surprisingly slow and wasteful. I had a big PR that really tweaked that performance but had to revert because of safety issues. You can see that whole saga in my PRs for opaque-keys. This has a potential of maybe 10% savings across this (based on time seen in OpaqueKey’s __eq__ method, though that potential savings probably heavily overlaps with the potential edx-when savings.

Remediation: Low Level

There are things here that we can dive into:

More efficient template substitutions

Template variable substitution can be done more efficiently. The vast majority of content carries no substitutions at all. Even for those that do, all that information is available to us at the time of publishing form Studio. The things we’re making substitutions with are configured in LMS, but detecting whether the substitutions exist and where the slots are could be done at publish time.

edx-when

The backwards compatibility behavior in edx-when’s ancestry crawling is something that could either be turned off or made more efficient. This an example of a problem that’s endemic to the way XBlocks work–it’s a very obect-centric instead of data-centric view of the world. The interface for every individual block that is fetching field data looks up its ancestry list without realizing that all its peers are doing the exact same thing and recalculating the same ancestors over and over again. It’s possible to add caching to make it work under the covers, but we have to careful of memory leaks. A request cache might work reasonably well here though.

opaque-keys

There be dragons here. That being said, I think that the PR that’s been reverted a few times now is actually pretty close to landable. But it’s possible to break things in really subtle and dangerous ways, as the OpaqueKeys are both absolutely critical to everything in Open edX and way more complicated than they have any right to be.

Staff-only Debug Code

I honestly haven’t looked at this in ages, but I’d bet it hasn’t been seriously profiled for optimization in years.

Remediation: High Level

Okay, with all that being said, the time taken to actually render the XBlock for the Unit directly (as you would for a mobile view) is about 250 ms. This would be a request that looks like: http://localhost:18000/xblock/block-v1:Appsembler+OmarMattPerfTestCourse+2020-08-28+type@vertical+block@057c430ebfa644abb7f39552a22f5bb0

We have mounds of complexity in edx-platform with respect to the XBlock runtime, and much of that comes from supporting all kinds of weird access patterns and maintaining backwards compatibility. I’ve fought this battle for a long time in various PRs (like this one). Various types of caching feed the complexity of the system because the core shape of the APIs weren’t built with it in mind. It’s too easy to introduce regressions that spiral out of control because the problems scale with the size of the entire sequence.

I’ve come to the conclusion that the only sustainable path out is not to make the XBlock runtime do the things it does faster, but to ask it to do less. For 99% of use cases, XBlocks only need the world that is their own Unit. That’s a much more tractable problem, and it’s the direction we’ve gone in with the new Courseware MFE work. We’re building a new Course Outline piece that runs entirely outside of the XBlock runtime (and is much, much faster). I talk a lot more about this in my post on Arch Decision Record for the Role of XBlocks

Apologies to anybody reading this who is waiting on a PR review from me. Performance has been something that I’ve been pushing for a long time at edX, and I can’t miss the opportunity to have this conversation with the community at large.

Take care.

Topic		Replies	Views
Timeout with randomized contents with large libraries Site Operators	4	827	March 25, 2022
Performance issue with Quizzes Releases	5	1266	October 18, 2021
ModuleStore and XBlock Runtime Simplification Progress Development	0	200	July 4, 2023
Course Module Completion Time Estimate is Incorrect Development how-to , tutor	7	27	February 18, 2025
Optimization Login loading Development tutor	1	146	July 13, 2023