Databases
I would like to differentiate here two concepts that are usually mixed:
- the database engine, of which we have four: MySQL, MongoDB, Redis, ElasticSearch
- the database itself, as the logical organization of data inside each database engine. There can be more than one in each engine.
It is important, because we wanted to minimize infrastructure resources (therefore costs), by sharing as much infrastructure as possible.
Modern database engines are powerful pieces of software, that can handle huge amounts of data. So the idea is to have one database engine of each type for the whole cluster, and one database for each Open edX instance in the cluster (which means, one k8s namespace).
Most of our findings are about AWS services for the database engines.
MySQL
AWS offers two options: MySQL RDS for MySQL and Aurora/MySQL. Aurora should be the best option, as is fully MySQL compatible, has lower costs and is optimized for AWS cloud resources. It can be configured to have multiple read replicas in several AZ.
We are creating one database for each Open edX installation.
Regarding backup, AWS backup is great but works at the database engine level (again confusing with the database concept). I asked the AWS guys about this, and looks like they do not support per-database backup currently. This means that if you have multiple Open edX instances and you want to restore only one, you cannot: you have to restore them all. So, for a one-by-one backup, the old dump-and-backup-to-s3 strategy is the only option.
MongoDB
AWS offers some options:
- DynamoDB: Don’t even try. It’s not compatible with MongoDB.
- DocumentDB: There is an old discussion about this. Although they say it’s compatible with MongoDB, it is not 100%. In a recent talk with one AWS engineers, they confirmed that these incompatibilities are still present.
- Atlas: It’s the official MongoDB, so should be 100% compatible. Too expensive for us to try.
- MongoDB quickstart: it’s a CloudFormation template that creates a MongoDB cluster in your VPC. It’s our option and so far worked well.
Open edX requires one database for the modulestore. If you are using forum, it will require another database.
As MongoDB is close to be deprecated, I wouldn’t spend much effort in it.
Redis
Redis is used for two main purposes in Open edX: cache and as backend for celery queues.
The concept of database in Redis is weird: they don’t have named databases, but a redis index number from 0 to 15. As we need two per Open edX instance, we have a limit of 8 Open edX instances per Redis instance.
If anyone knows how to overcome this limitation, it will be much appreciated!
We’ve been using AWS ElastiCache for Redis and works well so far.
ElasticSearch
We’ve been trying AWS OpenSearch. It is probably the most expensive service and difficult to escalate. We had to create a new domain for each Open edX instance, which creates a new set of resources.
So if anybody found a better way to escalate this service, again will be much appreciated.