Atlas MongoDB Performance Issues (un-indexed queries)

Is there a way to see what queries are taking longer to complete for MongoDB for Open edX?

Where getting a lot of document scan performance issues notifications related to un-indexed queries from Atlas that suggests it’s doing a full-scan of the data.

If anyone has MongoDB queries or a way to configure indexes for performance that would help out. We’re currently on the Maple release in production.

image

cc @dave @MoisesGonzalezS

I don’t think I’ve run into this on the sites that I’ve worked on. What does their query profiler say?

We’re going to implement New Relic and see if that can help us monitor MongoDB and other parts of the platform for now. We’ll post here if we find out anything further in regards to MongoDB performance issues.

@MoisesGonzalezS @dave
We’re receiving this error from the LMS often in our logs concerning pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet. Any way that you could supply us with your Django configuration for MongoDB?

LMS Log Error

tutor_local-lms-1         | During handling of the above exception, another exception occurred:
tutor_local-lms-1         | 
tutor_local-lms-1         | Traceback (most recent call last):
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
tutor_local-lms-1         |     response = get_response(request)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/crum/__init__.py", line 97, in __call__
tutor_local-lms-1         |     response = self.get_response(request)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 49, in inner
tutor_local-lms-1         |     response = response_for_exception(request, exc)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 114, in response_for_exception
tutor_local-lms-1         |     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 153, in handle_uncaught_exception
tutor_local-lms-1         |     return callback(request)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/djangoapps/util/views.py", line 95, in wrapper
tutor_local-lms-1         |     return func(request, *args, **kwargs)
tutor_local-lms-1         |   File "/openedx/edx-platform/./lms/djangoapps/static_template_view/views.py", line 129, in render_500
tutor_local-lms-1         |     return HttpResponseServerError(render_to_string('static_templates/server-error.html', {}, request=request))
tutor_local-lms-1         |   File "/openedx/edx-platform/common/djangoapps/edxmako/shortcuts.py", line 178, in render_to_string
tutor_local-lms-1         |     return template.render(dictionary, request)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/djangoapps/edxmako/template.py", line 82, in render
tutor_local-lms-1         |     return self.mako_template.render_unicode(**context_dictionary)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/template.py", line 478, in render_unicode
tutor_local-lms-1         |     return runtime._render(
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 878, in _render
tutor_local-lms-1         |     _render_context(
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 920, in _render_context
tutor_local-lms-1         |     _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 947, in _exec_template
tutor_local-lms-1         |     callable_(context, *args, **kwargs)
tutor_local-lms-1         |   File "/tmp/mako_lms/f1543ed24543a70de2c17ebfc03bb76a/main.html.py", line 357, in render_body
tutor_local-lms-1         |     runtime._include_file(context, (static.get_template_path('header.html')), _template_uri, online_help_token=online_help_token)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file
tutor_local-lms-1         |     callable_(ctx, **kwargs)
tutor_local-lms-1         |   File "/tmp/mako_lms/f1543ed24543a70de2c17ebfc03bb76a/header.html.py", line 34, in render_body
tutor_local-lms-1         |     runtime._include_file(context, (static.get_template_path(relative_path='header/header.html')), _template_uri, online_help_token=online_help_token)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file
tutor_local-lms-1         |     callable_(ctx, **kwargs)
tutor_local-lms-1         |   File "/tmp/mako_lms/f1543ed24543a70de2c17ebfc03bb76a/header/header.html.py", line 113, in render_body
tutor_local-lms-1         |     runtime._include_file(context, 'navbar-authenticated.html', _template_uri, online_help_token=online_help_token)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file
tutor_local-lms-1         |     callable_(ctx, **kwargs)
tutor_local-lms-1         |   File "/tmp/mako_lms/f1543ed24543a70de2c17ebfc03bb76a/choose-aerospace/lms/templates/header/navbar-authenticated.html.py", line 111, in render_body
tutor_local-lms-1         |     runtime._include_file(context, 'user_dropdown.html', _template_uri)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file
tutor_local-lms-1         |     callable_(ctx, **kwargs)
tutor_local-lms-1         |   File "/tmp/mako_lms/f1543ed24543a70de2c17ebfc03bb76a/header/user_dropdown.html.py", line 56, in render_body
tutor_local-lms-1         |     resume_block = retrieve_last_sitewide_block_completed(self.real_user)
tutor_local-lms-1         |   File "/openedx/edx-platform/./openedx/core/djangoapps/user_api/accounts/utils.py", line 171, in retrieve_last_sitewide_block_completed
tutor_local-lms-1         |     item = modulestore().get_item(candidate_block_key, depth=1)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 82, in inner
tutor_local-lms-1         |     retval = func(field_decorator=strip_key_collection, *args, **kwargs)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 248, in get_item
tutor_local-lms-1         |     return store.get_item(usage_key, depth, **kwargs)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split_draft.py", line 283, in get_item
tutor_local-lms-1         |     return super().get_item(usage_key, depth=depth, **kwargs)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py", line 1171, in get_item
tutor_local-lms-1         |     with self.bulk_operations(usage_key.course_key):
tutor_local-lms-1         |   File "/opt/pyenv/versions/3.8.12/lib/python3.8/contextlib.py", line 113, in __enter__
tutor_local-lms-1         |     return next(self.gen)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/__init__.py", line 186, in bulk_operations
tutor_local-lms-1         |     self._begin_bulk_operation(course_id, ignore_case)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/__init__.py", line 251, in _begin_bulk_operation
tutor_local-lms-1         |     self._start_outermost_bulk_operation(bulk_ops_record, course_key, ignore_case)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py", line 245, in _start_outermost_bulk_operation
tutor_local-lms-1         |     bulk_write_record.initial_index = self.db_connection.get_course_index(course_key, ignore_case=ignore_case)
tutor_local-lms-1         |   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/mongo_connection.py", line 374, in get_course_index
tutor_local-lms-1         |     return self.course_index.find_one(query)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mongodb_proxy.py", line 55, in wrapper
tutor_local-lms-1         |     return func(*args, **kwargs)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 1273, in find_one
tutor_local-lms-1         |     for result in cursor.limit(-1):
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1156, in next
tutor_local-lms-1         |     if len(self.__data) or self._refresh():
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1050, in _refresh
tutor_local-lms-1         |     self.__session = self.__collection.database.client._ensure_session()
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1810, in _ensure_session
tutor_local-lms-1         |     return self.__start_session(True, causal_consistency=False)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1763, in __start_session
tutor_local-lms-1         |     server_session = self._get_server_session()
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1796, in _get_server_session
tutor_local-lms-1         |     return self._topology.get_server_session()
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 487, in get_server_session
tutor_local-lms-1         |     self._select_servers_loop(
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 208, in _select_servers_loop
tutor_local-lms-1         |     raise ServerSelectionTimeoutError(
tutor_local-lms-1         | pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet
tutor_local-lms-1         | 
tutor_local-lms-1         | During handling of the above exception, another exception occurred:
tutor_local-lms-1         | 
tutor_local-lms-1         | Traceback (most recent call last):
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/mongodb_proxy.py", line 55, in wrapper
tutor_local-lms-1         |     return func(*args, **kwargs)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 1273, in find_one
tutor_local-lms-1         |     for result in cursor.limit(-1):
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1156, in next
tutor_local-lms-1         |     if len(self.__data) or self._refresh():
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1050, in _refresh
tutor_local-lms-1         |     self.__session = self.__collection.database.client._ensure_session()
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1810, in _ensure_session
tutor_local-lms-1         |     return self.__start_session(True, causal_consistency=False)
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1763, in __start_session
tutor_local-lms-1         |     server_session = self._get_server_session()
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1796, in _get_server_session
tutor_local-lms-1         |     return self._topology.get_server_session()
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 487, in get_server_session
tutor_local-lms-1         |     self._select_servers_loop(
tutor_local-lms-1         |   File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 208, in _select_servers_loop
tutor_local-lms-1         |     raise ServerSelectionTimeoutError(
tutor_local-lms-1         | pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet

We are configuring our Django configuration like so. I’m only showing the LMS configuration, however, the only difference between the following and the CMS is that the CMS has the "read_preference": "PRIMARY_PREFERRED". Note that I have only seen the error above in the LMS logs. I’m thinking that this is a write issue with the modulestore. Please advise. We can change the LMS to have "read_preference": "PRIMARY_PREFERRED" as well.

>>> settings.MODULESTORE
{
    "default": {
        "ENGINE": "xmodule.modulestore.mixed.MixedModuleStore",
        "OPTIONS": {
            "mappings": {},
            "stores": [
                {
                    "NAME": "split",
                    "ENGINE": "xmodule.modulestore.split_mongo.split_draft.DraftVersioningModuleStore",
                    "DOC_STORE_CONFIG": {
                        "db": "openedx",
                        "host": "shard-00-02.1spck.mongodb.net:27017,shard-00-00.1spck.mongodb.net:27017,shard-00-01.1spck.mongodb.net:27017",
                        "replicaSet": "atlas-8mdjf4-shard-0",
                        "user": "edxapp",
                        "port": 443,
                        "collection": "modulestore",
                        "ssl": True,
                        "socketTimeoutMS": 6000,
                        "connectTimeoutMS": 2000,
                        "auth_source": "admin",
                        "read_preference": "SECONDARY_PREFERRED",
                        "password": "hidden",
                        "authsource": "admin",
                        "retryWrites": True,
                        "w": "majority",
                    },
                    "OPTIONS": {
                        "default_class": "xmodule.hidden_module.HiddenDescriptor",
                        "fs_root": "/openedx/data/modulestore",
                        "render_template": "common.djangoapps.edxmako.shortcuts.render_to_string",
                    },
                },
                {
                    "NAME": "draft",
                    "ENGINE": "xmodule.modulestore.mongo.DraftMongoModuleStore",
                    "DOC_STORE_CONFIG": {
                        "db": "openedx",
                        "host": "shard-00-02.1spck.mongodb.net:27017,shard-00-00.1spck.mongodb.net:27017,shard-00-01.1spck.mongodb.net:27017",
                        "replicaSet": "atlas-8mdjf4-shard-0",
                        "user": "edxapp",
                        "port": 443,
                        "collection": "modulestore",
                        "ssl": True,
                        "socketTimeoutMS": 6000,
                        "connectTimeoutMS": 2000,
                        "auth_source": "admin",
                        "read_preference": "SECONDARY_PREFERRED",
                        "password": "hidden",
                        "authsource": "admin",
                        "retryWrites": True,
                        "w": "majority",
                    },
                    "OPTIONS": {
                        "default_class": "xmodule.hidden_module.HiddenDescriptor",
                        "fs_root": "/openedx/data/modulestore",
                        "render_template": "common.djangoapps.edxmako.shortcuts.render_to_string",
                    },
                },
            ],
        },
    }
}

@MoisesGonzalezS @dave

This article suggests adding in a connection timeout because pymongo client likes to use threads. I’m not sure that we’re getting this 500 internal error like this article indicates. Just the pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet error is what we’re seeing.

MongoDB error when running a web app but not when running normal script : Forums : PythonAnywhere (www.pythonanywhere.com)

connectTimeoutMS=30000, socketTimeoutMS=None, socketKeepAlive=True, connect=False, maxPoolsize=1

From pymongo documentation.
mongo_client – Tools for connecting to MongoDB — PyMongo 3.10.1 documentation

  • maxPoolSize (optional): The maximum allowable number of concurrent connections to each connected server. Requests to a server will block if there are maxPoolSize outstanding connections to the requested server. Defaults to 100. Cannot be 0.

  • socketTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait for a response after sending an ordinary (non-monitoring) database operation before concluding that a network error has occurred. Defaults to None (no timeout).

  • socketKeepAlive: (boolean) DEPRECATED Whether to send periodic keep-alive packets on connected sockets. Defaults to True. Disabling it is not recommended, see https://docs.mongodb.com/manual/faq/diagnostics/#does-tcp-keepalive-time-affect-mongodb-deployments”,

  • connect (optional): if True (the default), immediately begin connecting to MongoDB in the background. Otherwise connect on the first operation.

  • connectTimeoutMS: (integer or None) Controls how long (in milliseconds) the driver will wait during server monitoring when connecting a new socket to a server before concluding the server is unavailable. Defaults to 20000 (20 seconds).

@MoisesGonzalezS @dave
Really just trying to figure out how to best connect to a clustered MongoDB configuration with Open edX.

I found this article High Availability and PyMongo — PyMongo 3.10.1 documentation but it would be nice to know what others are configuring mongodb_parameters with. And if anyone is using AtlasDB, if there are additional configuration settings needed.

@MoisesGonzalezS @dave @BrianMesick

Are these the MongoDB indexes that need to be setup? Can anyone send me a list of MongoDB indexes for each collection, to ensure that we’re setup properly in production?

contentstore - ModuleContentStore - create collection indexes

edx-platform/xmodule/contentstore/mongo.py at master · openedx/edx-platform (github.com)

modulestore - ModuleModuleStore - create collection indexes

edx-platform/xmodule/modulestore/mongo/base.py at master · openedx/edx-platform (github.com)

split_mongo - MongoPersistenceBackend - create collection indexes

edx-platform/xmodule/modulestore/split_mongo/mongo_connection.py at master · openedx/edx-platform (github.com)

Here are the MongoDB indexes with tutor dev for the nutmeg release.

> show dbs
admin    0.000GB
config   0.000GB
local    0.000GB
openedx  0.318GB

> use openedx
switched to db openedx

> show collections
fs.chunks
fs.files
modulestore.active_versions
modulestore.definitions
modulestore.structures

> db.fs.chunks.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "openedx.fs.chunks"
        },
        {
                "v" : 2,
                "unique" : true,
                "key" : {
                        "files_id" : 1,
                        "n" : 1
                },
                "name" : "files_id_1_n_1",
                "ns" : "openedx.fs.chunks"
        }
]
> db.fs.files.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "openedx.fs.files"
        },
        {
                "v" : 2,
                "key" : {
                        "filename" : 1,
                        "uploadDate" : 1
                },
                "name" : "filename_1_uploadDate_1",
                "ns" : "openedx.fs.files"
        }
]
> db.modulestore.active_versions.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "openedx.modulestore.active_versions"
        }
]
> db.modulestore.definitions.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "openedx.modulestore.definitions"
        }
]
> db.modulestore.structures.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "openedx.modulestore.structures"
        }
]

@MoisesGonzalezS @dave @BrianMesick

We have the following collections in production. I believe the ones outlined in red box came with the split modulestore. The other collections that are there came from a release prior to the split modulestore.

This difference was meant to help with load. The LMS is set to prefer reading from the two secondaries (SECONDARY_PREFERRED) since they represent more read capacity overall than the single primary. But Studio has a number of scenarios where it basically does, "publish this course, and then have a bunch of tasks immediately read the values we just published and use that to bootstrap various data models. Because of this, even a few milliseconds of replication lag could introduce buggy behavior, and so Studio was set to always work off the primary if possible.

Does this error happen all the time, or is it intermittent? I know it’s a dumb question, but is there any chance some of the config values were miscopied between the configurations?

That being said, this seems like it would be a separate issue from your initial thread topic of unindexed queries being run. Though it could definitely contribute to slow load times if you have connection timeouts when trying to access MongoDB.

I got the same index value results that you got when running things locally on my devstack. I think the most likely place where this could be going wrong is modulestore.active_versions, which should have the (org, course, run) index on it. The example you point to here:

There is a management command (ensure_indexes) that will force that index to be rebuilt. But when trying to run it locally, I get an error about duplicate keys. I’m not clear on the root cause of that. I’m also not clear on when this initialization is supposed to happen in Tutor’s installation.

@dave Thanks. I’ll look into that additional missing index. We are getting that mongo error above often (two times a minute).

@dave
I’m going to try and leave 'read_preference': 'SECONDARY_PREFERRED' and match the additional configuration here. Looks like I forgot the socketTimeoutMS and connectTimeoutMS. That will probably fix this. I’ll let you know if it does. Thanks.

edx-platform/lms/envs/common.py at master · openedx/edx-platform (github.com)

DOC_STORE_CONFIG = {
    'db': 'edxapp',
    'host': 'localhost',
    'replicaSet': '',
    'password': 'password',
    'port': 27017,
    'user': 'edxapp',
    'collection': 'modulestore',
    'ssl': False,
    # https://api.mongodb.com/python/2.9.1/api/pymongo/mongo_client.html#module-pymongo.mongo_client
    # default is never timeout while the connection is open,
    #this means it needs to explicitly close raising pymongo.errors.NetworkTimeout
    'socketTimeoutMS': 6000,
    'connectTimeoutMS': 2000,  # default is 20000, I believe raises pymongo.errors.ConnectionFailure
    # Not setting waitQueueTimeoutMS and waitQueueMultiple since pymongo defaults to nobody being allowed to wait
    'auth_source': None,
    'read_preference': 'SECONDARY_PREFERRED'
}

@dave MongoDB configuration for the LMS works. It removes that pymongo.errors.ServerSelectionTimeoutError: No replica set members found yet mentioned above. Thanks for your help on this. I’ll continue to get New Relic setup to see what else what have as well and that additional Mongo index that you mentioned for modulestore.active_versions.

@dave It looks like that management command for ensure_indexes is here.
edx-platform/common/djangoapps/xblock_django/management/commands/ensure_indexes.py at master · openedx/edx-platform (github.com)

@dave Yeah, I’m getting that same duplicate key error which is probably why this MongoDB key was never setup. It’s from a content library.

LMS Container

./manage.py lms ensure_indexes

Error - duplicate key error

Traceback (most recent call last):
  File "./manage.py", line 106, in <module>
    execute_from_command_line([sys.argv[0]] + django_args)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/openedx/venv/lib/python3.8/site-packages/django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "/openedx/edx-platform/common/djangoapps/xblock_django/management/commands/ensure_indexes.py", line 21, in handle
    modulestore().ensure_indexes()
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 1041, in ensure_indexes
    store.ensure_indexes()
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py", line 3305, in ensure_indexes
    self.db_connection.ensure_indexes()
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/mongo_connection.py", line 510, in ensure_indexes
    create_collection_index(
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/mongo_utils.py", line 104, in create_collection_index
    raise exc
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/mongo_utils.py", line 92, in create_collection_index
    collection.create_index(keys, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/mongodb_proxy.py", line 117, in __call__
    return self.proxied_object(*args, **kwargs)
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 1995, in create_index
    self.__create_index(keys, kwargs, session, **cmd_options)
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 1890, in __create_index
    self._command(
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 235, in _command
    return sock_info.command(
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/pool.py", line 603, in command
    return command(self.sock, dbname, spec, slave_ok,
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/network.py", line 165, in command
    helpers._check_command_response(
  File "/openedx/venv/lib/python3.8/site-packages/pymongo/helpers.py", line 152, in _check_command_response
    raise DuplicateKeyError(errmsg, code, response)
pymongo.errors.DuplicateKeyError: Index build failed: 371d97eb-13f8-48f3-9420-a31e3c0c61c5: Collection openedx.modulestore.active_versions ( 93430788-fa2e-4fb2-9eb3-f9b31543f91e ) :: caused by :: E11000 duplicate key error collection: openedx.modulestore.active_versions index: org_1_course_1_run_1 dup key: { org: "NCCTE", course: "IM11-Practice-Exam", run: "library" }

Looking at the AtlasDB openedx.modulestore.active_versions it does appear that we have two results for that query { org: "NCCTE", course: "IM11-Practice-Exam", run: "library" }

We only have one content library with that name.
image

@dave
I checked the openedx.modulestore.structures collection for the course and verified what source_library_version that it was using and compared that with the openedx.modulestore.active_versions _id for both of the documents that had duplicate for { org: "NCCTE", course: "IM11-Practice-Exam", run: "library" }.

I removed the one that was not referenced in the course structure. After that, I was able to run successfully the ./manage.py lms ensure_indexes command without error.

I verified that the new index was created.

@dave

We were curious if you know if a later version of pymongo would require you to have to put those timeouts in for the LMS pymongo.MongoClient configuration or not.

'socketTimeoutMS': 6000,
'connectTimeoutMS': 2000,

We received an error that event-tracking repo needed a specific version of pymongo to work when trying to install latest pymongo==4.6. Looking at PyMongo — MongoDB Drivers MongoDB compatibility and Python language you could install pymongo >= 3.10 for maple.

Looks like event-tracking repo event-tracking/requirements/base.in at 1.1.1 · openedx/event-tracking (github.com) from EdX requires pymongo>=2.7.2,<4.0.0, so we’d have to go with 3.13 version.


After upgrading pymongo==3.13.0 for Maple, based on what edx-platform master branch was doing, we noticed this warning message below. I’m curious to know if you’re getting this same warning and if EdX did anything with subprocesses (os.fork) or not prior to making a call to pymongo.MongoClient. I’m assuming that edx-platform is not multiprocessing with pymongo.MongoClient.

Pymongo UserWarning in multiprocessing · Issue #1234 · MongoEngine/mongoengine (github.com)

lms_1                        | /openedx/venv/lib/python3.8/site-packages/pymongo/topology.py:175: UserWarning: MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: https://pymongo.readthedocs.io/en/stable/faq.html#is-pymongo-fork-safe

It looks like the latest version of pymongo does provide defaults for those values, though they’re bad ones (the socket timeout default is to wait forever). So you should specify something here even if did work without it.

So edX runs gunicorn instead of uwsgi, so the configuration is going to be different, and I’m not really familiar with tweaking the uwsgi side of things. (I’m no longer at edX, but most of my operational experience with the platform at scale dates comes from there.) That being said, I think I know the basics for how this worked for gunicorn and why modulestore initialization happens the way that it does, and I can give some context there.

The modulestore initialization for LMS happens in lms/wsgi.py. That’s this part:

(This link is to Maple, but it hasn’t really changed much in the past decade.)

Sanity check: This error is only happening in your LMS logs, right? Not your CMS logs? If that’s the case, the TLDR is you should probably test commenting out those lines I just quoted above in your staging environment ^.

That call to modulestore() initializes and caches a global instance of the thing that’s going to read and write course content data (there’s a lot of wackiness and compatibility stuff here that I’m just going to gloss over). The important thing was that at some point, this operation was really, really slow, particularly for edX. Because the original modulestore had the entire course content laid out as a set of XML files in various course directories on disk, and those would get completely parsed and read into memory on startup. That is Open edX’s “we’ll just build a prototype to do this one course” roots shining through. Even when Studio came into existence, certain courses on edx.org remained on the old XML file system modulestore until that system was removed sometime in early 2016.

That led to various experiments with preloading. The reason it’s in the lms.wsgi file is to force that expensive initialization to happen when gunicorn starts the process up, before it acknowledges that worker as being ready to serve requests. Because otherwise, the first request to hit it would trigger that initialization, and it would be so slow that the request would time out.

At some point, we experimented with gunicorn’s --preload flag. That would cause gunicorn to do all the startup work before forking out the workers. This seemed like a great idea at the time, since it meant that all that course XML parsing would only happen once, instead of once-per-worker. But we later ran into some really, really hard-to-debug behavior caused by caching and having shared descriptors between parent/child processes, and therefore gave up on it.

I imagine that uwsgi has a similar toggle, though I don’t know what it is off the top of my head. That being said, I don’t think it matters here, because I don’t think the modulestore() init is so expensive as to be worth putting in lms/wsgi.py any longer–I believe it was only an issue because of those XML filesystem based courses, which is why it’s not in cms/wsgi.py (since Studio never interacted with those courses).

So I think this style of modulestore initialization is a relic that was never really cleaned up by edX folks because it was harmless with the gunicorn configuration they used. I think it should be okay to remove from the lms/wsgi.py file now. It’s been sitting there for a decade, so it’s possible that there’s some usage scenario that I’m missing–and you should definitely test it somewhere. But it’s worth trying.

Good luck.

1 Like

@dave

Thanks for letting us know about making sure to set a time for pymongo connection.

I commented out those lines in the lms/wsgi.py file on a staging environment and I didn’t see the pymongo fork warning for the LMS. Yes, we were just seeing that warning on the LMS logs only. Appreciate your support on this!