Course structure cache never expire in juniper.3

My open edX site runs juniper.3 and I have found the size of memcached continues increasing these days.
I spent some time in analyzing contents in memcached and found most keys start with course_structure:xxxx, which are cache for course structure.
In Django settings, the expire time for course_structure cache was set to 7200 seconds, but seems that it is not working as a lot of very old keys still exist. I looked into CourseStructureCache code in common/lib/xmodule/xmodule/modulestore/split_mongo/mongo_connection.py and found below code snippet:

def set(self, key, structure, course_context=None):
    """Given a structure, will pickle, compress, and write to cache."""
    if self.cache is None:
        return None

    with TIMER.timer("CourseStructureCache.set", course_context) as tagger:
        pickled_data = pickle.dumps(structure, 4)  # Protocol can't be incremented until cache is cleared
        tagger.measure('uncompressed_size', len(pickled_data))

        # 1 = Fastest (slightly larger results)
        compressed_pickled_data = zlib.compress(pickled_data, 1)
        tagger.measure('compressed_size', len(compressed_pickled_data))

        # Stuctures are immutable, so we set a timeout of "never"
        self.cache.set(key, compressed_pickled_data, None)

There are comments "# Stuctures are immutable, so we set a timeout of “never”, does this mean in current open edX implementation, course structure will always stay in memcached as long as it is created? Is there a mechanism to delete old keys?

Thanks,
Pengcheng

This change was introduced by @feanil and @dave in 2015: Feanil/hotfix commits for rc by feanil · Pull Request #8651 · openedx/edx-platform · GitHub

Guys, do you have any idea why we would want to disable cache expiration for course structure caching?

The reasoning was that a structure document never changes and memcached does LRU eviction of values anyway, so we might as well keep it in there for as long as the cache allows, and rely on memcached to kick out the oldest entries as it needs when it runs to its allocation limit. Having cache evictions isn’t seen as an error state, but just memcached doing its thing.

@pcliu: Are you seeing performance penalties when memcached starts doing the evictions? Is it affecting your cache hit rate?

FWIW, if you’re seeing negative side-effects, I’m open to a PR to remove the explicit None timeout here, and have it fall back to the cache defaults, since I think we have a separate backend defined for course_structure_cache and can tweak those settings independently of the rest of the cache usage. But I think you should be able to just let it go and have memcached clean things up as needed.

Thank you.

Hi @dave , thanks for the clarification. Since we are using a single memcached instance for storing login sessions, course structures, and programs (pulled from discovery), it is difficult for us to set a proper memcached eviction policy as some data types must always be present in cache.

We once used the VolatileLRU(the data has expired time set may get evicted), but course structure data which takes most space never gets evicted. Do you suggest using separate memcached instances for different data types?

I would suggest having a separate one for the structures cache if possible (you can leave everything else on the existing system). If possible, you’ll also want to tweak the settings for the memcached you’re using for course structure documents, to bump up the max slab size to 2 MB (default is 1 MB), because some very large courses can get that large.

This cache will fill up, and the oldest things will start getting evicted, but you shouldn’t care about it. The only thing you’ll care about is your cache hit rate, which should end up being > 99% (it’ll probably end up closer to 99.99% if you leave the expiration as “never”).

Good luck!

I see, thanks for the explanation. @dave