If you run Open edX on PostgreSQL (or have tried to), could you please write a bit about your experiences here? There have been recent discussions in areas like full text search or specifying collation settings in fields that could potentially couple us more tightly to MySQL. We can maintain abstractions to try to avoid that, but that comes at a cost, and I’m trying to understand to what extent PostgreSQL support is valuable to folks.
Full disclosure: My personal position for a while has been that Postgres is a better database both on its own merits and in terms of its support in Django–but that the benefits don’t outweigh the switching costs. Or at least, the work necessary to safely migrate everything is not more important than the dozens of other things we could do with that developer/ops time and effort.
That being said, I have no idea how many folks are actually running PostgreSQL in production already, how important it is for folks who want to run it, etc.
We don’t have any clients that run Open edX on Postgres, but… it would be nice to leave that door open.
I dream of a world where we can run the stack with only a single relational database service (and without mongo… unrelated thread, but here’s links in case people want them, see Store modulestore’s course indexes in Django/MySQL and Replace cs_comments_service with pluggable alternatives). One of the working discussion plugins is Discourse, which requires Postgres, and it would be sp fantastic if one day we could run both Open edX and Discourse on the same sql db service.
I just want to add my personal opinion on this:
It’s also worth considering that the time and effort needed to support PostgreSQL is a setup cost meaning it occurs once, while the benefit of using PostgreSQL is continuous. So the tradeoff might be worth it from a long term perspective.
I respectfully disagree with that. PostgreSQL would be a one-time setup cost if we could pull a trigger and migrate everyone over in a relatively short period of time. That’s how I would think of it this was a commercial codebase at my company. But I believe that as an open source project, we’d be supporting both databases simultaneously for multiple Open edX releases. We would only really be able to enjoy the benefits of Postgres when MySQL was completely dropped for a given service, and we could stop distorting our schema to appease it. Maybe it’s still worth it, but we’re probably looking at a years-long transition to ensure that we’re not leaving behind big chunks of the community.
I can see at least four paths:
Status quo: MySQL (+ MariaDB?) is official and well tested, but aim for PostgreSQL compatibility and accept patches to fix any issues.
Fully support both databases, by increasing our testing with PostgreSQL.
Lean into MySQL, by allowing more MySQL-specific features via libraries like django-mysql.
Lean into PostgreSQL, with an eye towards long term migration.
In addition, we could try to fund development efforts to improve Django’s support for MySQL, which would fit for approaches 1-3.
So it turns out that we added django-mysql as an edx-platform dependency back in 2018, so it’s extremely unlikely that PostgreSQL has worked for the LMS or Studio at all since that time.
Edit: After reading a little more, it looks like we use django-mysql just to use its ListCharField, which seems to just build on CharField, and shouldn’t lock out other databases (which makes sense, since the tests are backed with in-memory SQLite).
As part of ongoing efforts to enhance Open edX compatibility with PostgreSQL, I have dedicated time to address and update migrations, ensuring they work seamlessly with both PostgreSQL and MySQL.
I have submitted the following pull requests to support this initiative:
@qasimgulzar: I’m excited to see that someone is working on PostgreSQL support. But before we start merging this into the various repos it needs to go into, I’d like to understand your long term goals maintenance implications.
Many (most?) installations of Open edX will continue to run MySQL for a long time to come, possibly indefinitely. If you’re envisioning this as a first step towards an eventual forced migration from MySQL to PostgreSQL for the community at large, then I think we need to have a larger conversation before this goes any further.
If this is envisioned as supporting both database backends concurrently, then I’d like to understand the maintenance plan. I’m going to assume that part of this effort would take the form of a PostgreSQL plugin for Tutor. We’d probably also add postgres to our testing matrix in GitHub. But we’d still be in a place where the majority of Open edX sites, developers, and the standard release testing process would use MySQL for the foreseeable future.
Is Arbisoft willing to maintain that Tutor plugin and commit to testing with PostgreSQL for every release? Will you be actively using PostgreSQL in production on some of your own instances?
It’s great to reconnect with you! These changes represent the initial steps in making the edx-platform fully compatible with both PostgreSQL and MySQL. In the coming weeks, I plan to introduce new GitHub Actions to support PostgreSQL tests in subsequent PRs.
While I am no longer with Arbisoft, I am contributing these enhancements as an individual contributor. My long-term goal is to take ownership of these updates to ensure their continued improvement.
Additionally, I am planning to author a Tutor plugin to facilitate PostgreSQL support.
All right, that sounds good. I created an issue in openedx-learning describing what would need to be done to make that repo support PostgreSQL. I’ll start reviewing your PRs this week.