MongoDB intermittent connection timeout errors in pymongo/

I’m doing diagnostics on a mostly-operational production installation of Tutor-installed Lilac running on an AWS EKS Kubernetes cluster 1.21 with mongo installed as a pod (the tutor default). The platform has been in production for around 9 months. Around a month ago the client began reporting occasional 500 errors which I eventually traced back to PyMongo. See example stack trace below. A couple of times a minute the pymongo module fails with a timeout error when polling for the single replica set that is currently running.

The MongoDB instance currently contains around 5gb of data, and the client routinely and incrementally adds course content, such that it’s size has grown over time from approximately 1gb at the initial platform launch mid-year 2021. 5gb seems at most, average, such that I doubt that it is really warranted to start fiddling with the local Mongo configuration settings.

Has anyone run into this, and if so, how did you mitigate the problem?

Example stack trace, which is being generated two or three times per minutes in the lms log:

Traceback (most recent call last):
File “/openedx/venv/lib/python3.8/site-packages/django/core/handlers/”, line 141, in call
response = self.get_response(request)
File “/openedx/venv/lib/python3.8/site-packages/django/core/handlers/”, line 75, in get_response
response = self._middleware_chain(request)
File “/openedx/venv/lib/python3.8/site-packages/django/core/handlers/”, line 36, in inner
response = response_for_exception(request, exc)
File “/openedx/venv/lib/python3.8/site-packages/django/core/handlers/”, line 90, in response_for_exception
response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())
File “/openedx/venv/lib/python3.8/site-packages/django/core/handlers/”, line 129, in handle_uncaught_exception
return callback(request, **param_dict)
File “/openedx/edx-platform/common/djangoapps/util/”, line 95, in wrapper
return func(request, *args, **kwargs)
File “./lms/djangoapps/static_template_view/”, line 125, in render_500
return HttpResponseServerError(render_to_string(‘static_templates/server-error.html’, {}, request=request))
File “/openedx/edx-platform/common/djangoapps/edxmako/”, line 182, in render_to_string
return template.render(dictionary, request)
File “/openedx/edx-platform/common/djangoapps/edxmako/”, line 82, in render
return self.mako_template.render_unicode(**context_dictionary)
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 478, in render_unicode
return runtime._render(
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 878, in _render
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 920, in _render_context
_exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 947, in exec_template
(context, *args, **kwargs)
File “/tmp/mako_lms/66556456bab4df73b8f14becaed61676/”, line 345, in render_body
runtime._include_file(context, (static.get_template_path(‘header.html’)), _template_uri, online_help_token=online_help_token)
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 795, in include_file
(ctx, **kwargs)
File “/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/”, line 34, in render_body
runtime._include_file(context, (static.get_template_path(relative_path=‘header/header.html’)), _template_uri, online_help_token=online_help_token)
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 795, in include_file
(ctx, **kwargs)
File “/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header/”, line 106, in render_body
runtime._include_file(context, ‘navbar-authenticated.html’, _template_uri, online_help_token=online_help_token)
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 795, in include_file
(ctx, **kwargs)
File “/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header/”, line 156, in render_body
runtime._include_file(context, ‘user_dropdown.html’, _template_uri)
File “/openedx/venv/lib/python3.8/site-packages/mako/”, line 795, in include_file
(ctx, **kwargs)
File “/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header/”, line 57, in render_body
resume_block = retrieve_last_sitewide_block_completed(self.real_user)
File “./openedx/core/djangoapps/user_api/accounts/”, line 172, in retrieve_last_sitewide_block_completed
item = modulestore().get_item(candidate_block_key, depth=1)
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/”, line 91, in inner
retval = func(field_decorator=strip_key_collection, *args, **kwargs)
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/”, line 256, in get_item
store = self._get_modulestore_for_courselike(usage_key.course_key)
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/”, line 217, in _get_modulestore_for_courselike
if has_locator(store):
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/”, line 215, in
has_locator = lambda store: store.has_course(locator)
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/”, line 1189, in has_course
course_index = self.get_course_index(course_id, ignore_case)
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/”, line 308, in get_course_index
return self.db_connection.get_course_index(course_key, ignore_case)
File “/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/”, line 438, in get_course_index
return self.course_index.find_one(query)
File “/openedx/venv/lib/python3.8/site-packages/”, line 55, in wrapper
return func(*args, **kwargs)
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 1273, in find_one
for result in cursor.limit(-1):
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 1156, in next
if len(self.__data) or self._refresh():
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 1050, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 1810, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 1763, in __start_session
server_session = self._get_server_session()
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 1796, in _get_server_session
return self._topology.get_server_session()
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 482, in get_server_session
File “/openedx/venv/lib/python3.8/site-packages/pymongo/”, line 208, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: No servers found yet

@lpm0073, we have faced a similar problem, I have created one topic on Tutor forum and one on MongoDB Developer Community forum for the same. But no response.

To get rid of this error, we have installed MongoDB outside of the cluster on a separate server. Pointed this new MongoDB to Tutor and we have not encountered pymongo error once.

You can give this a try and we are still digging for Why this error is coming up. And this topic is more specific to Tutor + K8s not for Open edX.

1 Like

We’re running into the ServerSelectionTimeoutError exceptions as well (on Maple, but read on for why you’re equally likely to run into this on Nutmeg).

According to a Stack Overflow discussion on the matter, this appears to be a problem in pymongo that was apparently fixed in version 3.12 of that client library.

Sadly though, edx-platform currently appears to require pymongo<3.11 even in the master branch. This means that it uses a version that is 2½ years old. And since the comment in constraints.txt indicates that tests fail with later pymongo releases, it’s probably not safe to upgrade pymongo via Tutor’s OPENEDX_EXTRA_PIP_REQUIREMENTS setting.

Is anyone aware of a workaround that can be applied in the Tutor context, with MongoDB being a part of the Tutor installation?

There’s work actively in progress to upgrade pymongo in edx-platform. It’s not quite done yet, but it’s pretty close. And it looks like it might not be too painful to backport to Nutmeg.

Thanks, that’s encouraging!

Meanwhile, is anyone aware of a workaround to make this go away, short of running MongoDB outside of Tutor? This strikes me as a rather puzzling issue, as we see it on only one of our Tutor-managed Open edX deployments, whereas other Tutor instances — same Tutor version, identical Open edX images — on the exact same Kubernetes cluster (in different namespaces) do not throw this exception.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.