Hello. At my company wanted to use open edX as our LMS. We wanted to perform some analytics on the content, so that we can have more visibility about, for example, which courses were seen more than others. For this, we wanted to access the DBs of the deployment to access this data. The thing is that open edX uses many different databases (mongodb, mysql, redis, elasticsearch), and we don’t know what is stored in each of them. Is it documented somewhere what is the purpose of each of these DBs?
I’m not sure if/where it’s documented, but here’s a high level summary:
MySQL: most data not listed anywhere else. User accounts, enrollments, grades, etc.
MongoDB: course content and course static assets. Also holds data for the built-in discussions if those are in use. We are trying to remove MongoDB and put everything into MySQL in future versions of the platform.
Redis/memcache: used as a cache, for storing user sessions, block structures, and other temporary data.
ElasticSearch: search indexes that power various search experiences like searching for courses or searching through course content.
Tracking logs: important log files and events that get written to files on disk, and which can also be streamed into an analytics service or data lake.
Typically you only need to back up MySQL, MongoDB, and the logs, as the data in the other systems is ephemeral or can be re-generated from the MySQL+MongoDB data.
Keep in mind that analytics based on the underling database content may break as the format of the data can be changed anytime, though the core tables tend to be very stable.
What sort of analytics do you want? Probably someone has had similar needs before and built some scripts or tools that you can use without having to reinvent the wheel. tCRIL is also developing a comprehensive analytics system called OARS but it has only recently started development.
Hello Braden. Thanks for your answer. For example, one of the analytics we want to perform consists of reporting to how much a video of a course has been watched along the time. Is this information more likely to appear in any of those databases? Sounds like something like this can be found in MongoDB?
Thanks for your answer and my apologies for my late reply.
The information available in “edX Insights” (the first link you sent) probably contains what we are looking for, but it looks a bit over-killing and as you said expensive, so probably it is not going to be our first option.
This cairn option might have what we are looking for. I will investigate more about it. I will also see if I find this information of “how many users have seen a determined course video” in the tracking logs.