Help please: Very slow load time (10 seconds) for courseware on sections with several subsections and xblocks!

Hey @dave

This is just a follow up on what we did and an explanation of what I understood from the code.

The investigation of code led me to two old PRs:

https://github.com/edx/edx-platform/pull/14571/

https://github.com/edx/edx-platform/pull/14770/

These PR points to: https://openedx.atlassian.net/wiki/display/MA/Block+Structure+Cache+Invalidation+Proposal

which can give more idea but it’s not publically accessible so I had to read through.

From what I saw it felt like we switched from BlockStructureCache to BlockSturctureStore. This means we did move away from the cache only approach to the cache and storage approach. This is what is called Tiering. (I got to learn about this in the process)

Essentially what is happening is a request tries to access BS data when it is not found in cache it enters Collect Mode and generates that data for the course which is a compute-heavy process. And as Dave pointed out in the above comment if there are more than one workers accessing it and missing cache this process is carried for more times.

This leads to slower response time of the course. The first solution we went ahead with is because we have smaller cache size we thought if we vertically scale it and make the cache persistent that should solve our problem. That’s because cache will always be present and should serve the request.

Having said that and as Dave pointed out slab size of Memcache is 1 MB so if the BS is greater than that then it would not store it in cache and not even raise an error.

Here comes our solution of adding a S3 storage here, this is the AWS Config to extend in lms.yml. I am using s3 storage as the storage strategy.

BLOCK_STRUCTURES_SETTINGS:
  COURSE_PUBLISH_TASK_DELAY: 30
  TASK_DEFAULT_RETRY_DELAY: 30
  TASK_MAX_RETRIES: 5
  STORAGE_CLASS: 'storages.backends.s3boto.S3BotoStorage'
  STORAGE_KWARGS:
    AWS_ACCESS_KEY_ID: '<placeholder>'
    AWS_SECRET_ACCESS_KEY: '<secretkey>'
    AWS_STORAGE_BUCKET_NAME: 'bucket-name'
DIRECTORY_PREFIX: "/directory-name/"
PRUNING_ACTIVE: true

Once you have done this you probably need to regenerate and store the BS data in S3 bucket you can use

./manage.py lms generate_course_blocks --all_courses --with_storage

This will save it in the bucket, now if there is a cache miss the workers won’t go in Collect Mode rather make a network call and get the data from the bucket. This made a ~10 second into a ~5 second call.

There are few switches you need to activate:

block_structure.storage_backing_for_cache

block_structure.raise_error_when_not_found

block_structure.invalidate_cache_on_publish

Also introduce a BlockStructureConfiguration with version 1 and cache expiration to None.

Let me know if I missed something or I am understanding it wrong.

3 Likes