MongoDB intermittent connection timeout errors in pymongo/topology.py

I’m doing diagnostics on a mostly-operational production installation of Tutor-installed Lilac running on an AWS EKS Kubernetes cluster 1.21 with mongo installed as a pod (the tutor default). The platform has been in production for around 9 months. Around a month ago the client began reporting occasional 500 errors which I eventually traced back to PyMongo. See example stack trace below. A couple of times a minute the pymongo topology.py module fails with a timeout error when polling for the single replica set that is currently running.

The MongoDB instance currently contains around 5gb of data, and the client routinely and incrementally adds course content, such that it’s size has grown over time from approximately 1gb at the initial platform launch mid-year 2021. 5gb seems at most, average, such that I doubt that it is really warranted to start fiddling with the local Mongo configuration settings.

Has anyone run into this, and if so, how did you mitigate the problem?

Example stack trace, which is being generated two or three times per minutes in the lms log:

Traceback (most recent call last):     
   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/wsgi.py", line 141, in __call__       
     response = self.get_response(request)        
   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 75, in get_response    
     response = self._middleware_chain(request)   
   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 36, in inner      
     response = response_for_exception(request, exc)        
   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 90, in response_for_exception         
     response = handle_uncaught_exception(request, get_resolver(get_urlconf()), sys.exc_info())     
   File "/openedx/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 129, in handle_uncaught_exception     
     return callback(request, **param_dict)       
   File "/openedx/edx-platform/common/djangoapps/util/views.py", line 95, in wrapper      
     return func(request, *args, **kwargs)        
   File "./lms/djangoapps/static_template_view/views.py", line 125, in render_500         
     return HttpResponseServerError(render_to_string('static_templates/server-error.html', {}, request=request))        
   File "/openedx/edx-platform/common/djangoapps/edxmako/shortcuts.py", line 182, in render_to_string         
     return template.render(dictionary, request)  
   File "/openedx/edx-platform/common/djangoapps/edxmako/template.py", line 82, in render 
     return self.mako_template.render_unicode(**context_dictionary)   
   File "/openedx/venv/lib/python3.8/site-packages/mako/template.py", line 478, in render_unicode   
     return runtime._render(  
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 878, in _render 
     _render_context(         
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 920, in _render_context   
     _exec_template(inherit, lclcontext, args=args, kwargs=kwargs)    
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 947, in _exec_template    
     callable_(context, *args, **kwargs)
   File "/tmp/mako_lms/66556456bab4df73b8f14becaed61676/main.html.py", line 345, in render_body     
     runtime._include_file(context, (static.get_template_path('header.html')), _template_uri, online_help_token=online_help_token)
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file     
     callable_(ctx, **kwargs) 
   File "/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header.html.py", line 34, in render_body 
     runtime._include_file(context, (static.get_template_path(relative_path='header/header.html')), _template_uri, online_help_token=online_help_token)         
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file     
     callable_(ctx, **kwargs) 
   File "/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header/header.html.py", line 106, in render_body   
     runtime._include_file(context, 'navbar-authenticated.html', _template_uri, online_help_token=online_help_token)    
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file     
     callable_(ctx, **kwargs) 
   File "/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header/navbar-authenticated.html.py", line 156, in render_body         
     runtime._include_file(context, 'user_dropdown.html', _template_uri)     
   File "/openedx/venv/lib/python3.8/site-packages/mako/runtime.py", line 795, in _include_file     
     callable_(ctx, **kwargs) 
   File "/tmp/mako_lms/66556456bab4df73b8f14becaed61676/gcsi-openedx-theme/lms/templates/header/user_dropdown.html.py", line 57, in render_body       
     resume_block = retrieve_last_sitewide_block_completed(self.real_user)      
   File "./openedx/core/djangoapps/user_api/accounts/utils.py", line 172, in retrieve_last_sitewide_block_completed     
     item = modulestore().get_item(candidate_block_key, depth=1)      
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 91, in inner  
     retval = func(field_decorator=strip_key_collection, *args, **kwargs)       
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 256, in get_item        
     store = self._get_modulestore_for_courselike(usage_key.course_key)         
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 217, in _get_modulestore_for_courselike     
     if has_locator(store):   
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 215, in <lambda>        
     has_locator = lambda store: store.has_course(locator)  
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py", line 1189, in has_course   
     course_index = self.get_course_index(course_id, ignore_case)     
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py", line 308, in get_course_index        
     return self.db_connection.get_course_index(course_key, ignore_case)        
   File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/mongo_connection.py", line 438, in get_course_index       
     return self.course_index.find_one(query)     
   File "/openedx/venv/lib/python3.8/site-packages/mongodb_proxy.py", line 55, in wrapper 
     return func(*args, **kwargs)       
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/collection.py", line 1273, in find_one   
     for result in cursor.limit(-1):    
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1156, in next 
     if len(self.__data) or self._refresh():      
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1050, in _refresh       
     self.__session = self.__collection.database.client._ensure_session()       
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1810, in _ensure_session    
     return self.__start_session(True, causal_consistency=False)      
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1763, in __start_session    
     server_session = self._get_server_session()  
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1796, in _get_server_session
     return self._topology.get_server_session()   
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 482, in get_server_session      
     self._select_servers_loop(         
   File "/openedx/venv/lib/python3.8/site-packages/pymongo/topology.py", line 208, in _select_servers_loop    
     raise ServerSelectionTimeoutError( 
 pymongo.errors.ServerSelectionTimeoutError: No servers found yet

@lpm0073, we have faced a similar problem, I have created one topic on Tutor forum and one on MongoDB Developer Community forum for the same. But no response.

To get rid of this error, we have installed MongoDB outside of the cluster on a separate server. Pointed this new MongoDB to Tutor and we have not encountered pymongo error once.

You can give this a try and we are still digging for Why this error is coming up. And this topic is more specific to Tutor + K8s not for Open edX.

1 Like

We’re running into the ServerSelectionTimeoutError exceptions as well (on Maple, but read on for why you’re equally likely to run into this on Nutmeg).

According to a Stack Overflow discussion on the matter, this appears to be a problem in pymongo that was apparently fixed in version 3.12 of that client library.

Sadly though, edx-platform currently appears to require pymongo<3.11 even in the master branch. This means that it uses a version that is 2½ years old. And since the comment in constraints.txt indicates that tests fail with later pymongo releases, it’s probably not safe to upgrade pymongo via Tutor’s OPENEDX_EXTRA_PIP_REQUIREMENTS setting.

Is anyone aware of a workaround that can be applied in the Tutor context, with MongoDB being a part of the Tutor installation?

There’s work actively in progress to upgrade pymongo in edx-platform. It’s not quite done yet, but it’s pretty close. And it looks like it might not be too painful to backport to Nutmeg.

Thanks, that’s encouraging!

Meanwhile, is anyone aware of a workaround to make this go away, short of running MongoDB outside of Tutor? This strikes me as a rather puzzling issue, as we see it on only one of our Tutor-managed Open edX deployments, whereas other Tutor instances — same Tutor version, identical Open edX images — on the exact same Kubernetes cluster (in different namespaces) do not throw this exception.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Taking the liberty to re-open this topic. I was made aware of this issue via this PR: fix: add lazy-apps to uwsgi.ini by viadanna · Pull Request #858 · overhangio/tutor · GitHub

Do you know whether this issue still affects Palm/pymongo==3.13.0?

1 Like

This issue is now tracked here: MongoDB intermittent connection timeout errors in pymongo/topology.py · Issue #865 · overhangio/tutor · GitHub