Server down caused by mongoDB connection problem

Hello everyone,

Our servers went down yesterday due to a mongoDB connection problem

Mar  1 22:59:39 xyz [service_variant=lms][gunicorn.error][env:sandbox] INFO [xyz  2657] [glogging.py:213] - Worker exiting (pid: 2657)
Mar  1 22:59:39 xyz [service_variant=lms][gunicorn.error][env:sandbox] ERROR [xyz  2728] [glogging.py:219] - Exception in worker process:

  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 425, in __init__
    raise ConnectionFailure(str(e))
ConnectionFailure: [Errno 4] Interrupted system call

the problem appeared in the middle of the night, and the problem solved itself after 4 hours. But we prefer to anticipate in case of the problem occurs again, has anyone already faced this issue?

I am on Ficus, and ubuntu 16.04

Here is the complete log’s list :

Mar  1 22:59:39 xyz [service_variant=lms][gunicorn.error][env:sandbox] INFO [xyz  2657] [glogging.py:213] - Worker exiting (pid: 2657)
Mar  1 22:59:39 xyz [service_variant=lms][gunicorn.error][env:sandbox] ERROR [xyz  2728] [glogging.py:219] - Exception in worker process:
Traceback (most recent call last):
  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gunicorn/arbiter.py", line 473, in spawn_worker
    worker.init_process()
  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gunicorn/workers/base.py", line 100, in init_process
    self.wsgi = self.app.wsgi()
  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gunicorn/app/base.py", line 106, in wsgi
    self.callable = self.load()
  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 27, in load
    return util.import_app(self.app_uri)
  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/gunicorn/util.py", line 353, in import_app
    __import__(module)
  File "/edx/app/edxapp/edx-platform/lms/wsgi.py", line 32, in <module>
    modulestore()
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/modulestore/django.py", line 242, in modulestore
    settings.MODULESTORE['default'].get('OPTIONS', {})
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/modulestore/django.py", line 224, in create_modulestore_instance
    **_options
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/modulestore/mixed.py", line 180, in __init__
    signal_handler=signal_handler,
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/modulestore/django.py", line 224, in create_modulestore_instance
    **_options
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/split.py", line 667, in __init__
    self.db_connection = MongoConnection(**doc_store_config)
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/modulestore/split_mongo/mongo_connection.py", line 300, in __init__
    retry_wait_time=retry_wait_time, **kwargs
  File "/edx/app/edxapp/edx-platform/common/lib/xmodule/xmodule/mongo_utils.py", line 42, in connect_to_mongodb
    **kwargs
  File "/edx/app/edxapp/venvs/edxapp/local/lib/python2.7/site-packages/pymongo/mongo_client.py", line 425, in __init__
    raise ConnectionFailure(str(e))
ConnectionFailure: [Errno 4] Interrupted system call

Thanks !

Hi @dimitri-hoareau-WEL,

From the error type, this looks to me like a system problem (as opposed to an Open edX problem) in the machine that runs edx-platform (as opposed to the ones that run Mongo). Possibly a resource contention of some kind.

To diagnose this kind of issue, I have found atop to be extremely useful. This article gives a good practical overview. In short, you enable continuous logging so that later you can “roll back time” to find what process was hogging all the CPU (or RAM, or IO… you get the picture).

1 Like