Migrate MongoDB to AWS DocumentDB

Hi all,

AWS offers a MongoDB managed service called DocumentDB. Does anybody have experience in this? They say it’s compatible with version 3.4+.

I was planning to move mongodb out of the native installation. What is the best option? A new Ubuntu instance running only Mongodb? I saw Atlas but it looks too expensive.

Is there any guide to connect to an external mongodb?

I would suggest you read this before going with DocumentDB

It might be an alternative in the future, the fine folks in the edX engineering team could tell us, but since Juniper runs on mongoDB 3.6 there are major compatibility issues right now.

1 Like

Interesting article @sambapete! Thanks for finding and sharing.
Looks like Mongo guys want to defend their $2000/month Atlas vs a ~50/month DocumentDB… The question would be how many of those failing tests features one really needs. @regis said that Open edx makes a very basic use of mongo.
As soon as I can I will give it a try, and check with AWS guys what do they have to say and if they can help validate.

I also saw a post in Slack where someone tried DocumentDB recently and they encountered issues with $$ROOT

pymongo.errors.OperationFailure: Feature not supported: $$ROOT


Yes, that was I haha.
It was something we figured we’d try.

The callstack error was:

2020-10-07 16:23:24,450: INFO/MainProcess] Received task: contentstore.tasks.export_olx[53f3837a-aee1-4eb1-9dcf-dd4bd25de5cb]
[2020-10-07 16:23:24,594: ERROR/Worker-17] contentstore.tasks.export_olx[53f3837a-aee1-4eb1-9dcf-dd4bd25de5cb]: There was an error exporting course-v1:University+testdata-course+199801
Traceback (most recent call last):
  File "/openedx/edx-platform/cms/djangoapps/contentstore/tasks.py", line 287, in create_export_tarball
export_course_to_xml(modulestore(), contentstore(), course_module.id, root_dir, name)
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/xml_exporter.py", line 345, in export_course_to_xml
CourseExportManager(modulestore, contentstore, course_key, root_dir, course_dir).export()
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/xml_exporter.py", line 179, in export
self.process_extra(root, courselike, root_courselike_dir, xml_centric_courselike_key, export_fs)
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/modulestore/xml_exporter.py", line 226, in process_extra
root_courselike_dir + '/policies/assets.json',
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 208, in export_all_for_course
assets, __ = self.get_all_content_for_course(course_key)
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 232, in get_all_content_for_course
course_key, start=start, maxresults=maxresults, get_thumbnails=False, sort=sort, filter_params=filter_params
  File "/openedx/venv/lib/python3.5/site-packages/mongodb_proxy.py", line 55, in wrapper
return func(*args, **kwargs)
  File "/openedx/edx-platform/common/lib/xmodule/xmodule/contentstore/mongo.py", line 323, in _get_all_content_for_course
cursor = self.fs_files.aggregate(pipeline_stages)
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/collection.py", line 2380, in aggregate
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/collection.py", line 2299, in _aggregate
retryable=not cmd._performs_write)
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/mongo_client.py", line 1465, in _retryable_read
return func(session, server, sock_info, slave_ok)
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/aggregation.py", line 148, in get_cursor
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/pool.py", line 613, in command
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/network.py", line 167, in command
  File "/openedx/venv/lib/python3.5/site-packages/pymongo/helpers.py", line 159, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: Feature not supported: $$ROOT

It was mentioned that there was use here:

(with $$ROOT)

and here:

(with $setunion)

I’d like to summarize what we have so far, including a slack chat with @shadinaif and @irishgordo.

DocumentDB is a AWS service that is supposed to be compatible with MongoDB 3.6.
There is an article in the MongoDB site saying that DocumentDB does not pass 60% of the compatibility tests, while the Mongo maintained AWS service called Atlas is 100% compliant. The code of test tool is public, however it is hard from it to find exactly what is compatible and what not.

Although Mongo says that Atlas is cheaper than DocumentDB in this article, if you buy Atlas from the AWS marketplace you must by $24.000 upfront for a 1 year credit balance. DocumentDB has a per-user billing more suitable for smaller instances. AWS offers a price calculator. The other option is to buy Atlas directly from Mongo under a pay-as-you-go option, but the billing will not be integrated with your AWS bill.

There is a complete compatibility matrix of DocumentDB.
@shadinaif found two calls to unsupported functions in edx-platform:
1- $ROOT that’s used here https://github.com/edx/edx-platform/blob/44562695087b546dd4a26e8993e3cc318efa5a91/common/lib/xmodule/xmodule/contentstore/mongo.py#L312
2- $setUnion that’s used here https://github.com/edx/edx-platform/blob/44562695087b546dd4a26e8993e3cc318efa5a91/cms/djangoapps/contentstore/management/commands/clean_cert_name.py#L81
But possibly there are more calls to other unsupported functions, in edx-platform or even other related apps.

IMO, DocumentDB is worth a try. But in order to make it work, we should:

  1. Identify exactly which calls are made to unsupported features all across the code
  2. If possible try to make a workaround changing the code
  3. Where it is not possible, we can submit a feature request to the AWS service team (an email address is available)

Thanks for your contribution, and welcome to the community!!

1 Like

Meanwhile you can setup mongoDB on saperate server and connect edx with it :slight_smile: it would be easy to setup.

true, that is definitely an option.
the one thing DocumentDB allows that’s it’s fully managed by AWS so you don’t have to manage your MongoDB instances in the cluster yourself.

And it’s much cheaper than Atlas, so more available to more communities/organizations/individuals that want a scalable highly available MongoDB datastore
1 Like