Nutmeg introduced a new bug in outlines where the underlying issue was our use of utf8
character set encoding for our MySQL tables.
The problem: MySQL’s utf8
encoding isn’t “real” UTF-8. It’s a three-byte encoding that is missing certain characters, including emojis. We want to upgrade to utf8mb4
which is MySQL’s real UTF-8 data. This has been an issue for a while, but is only going to bite us more over time as some of our content storage gets shifted into MySQL.
Some relevant pieces that make a migration interesting:
- In addition to targeting
utf8mb4
for the charset encoding, we probably want to switch the collation to the new MySQL 8.0.1 default ofutf8mb4_0900_ai_ci
. This collation is unavailable in MySQL 5.7. - We need to complete the migration from MySQL 5.7 to 8.0 at some point soon (Olive?), because 5.7’s end of life is Oct. 2023. If that is going to be a disruptive change anyway, we might try to bundle encoding changes at the same time.
- In theory, edx-platform can run using either as of Nutmeg, thanks to work done by 2U. To my knowledge, nobody has actually taken the plunge and tried MySQL 8.0 yet though.
- The migration path I’m aware of involves backup → export → import, which may not be feasible for people running large sites.
- I don’t know if there’s any plausible way to run in a “mixed mode” to do gradual rollouts, since Django settings also have to be tweaked to transmit data using
utf8mb4
. - Many fields were created as varchar(255) because
utf8
’s 3-byte character encoding means that it’s the largest char field you can make that still fits under InnoDB’s 767 character index limit. Newer tables using DYNAMIC row format can go over this, but doing so may have performance implications.
In an ideal world, it would be great to have something that does a seamless, zero-downtime upgrade. One step down from that is maybe a Tutor script that can do the upgrade process with some downtime? A first step might be to at least make it so that Tutor dev on nightly runs MySQL 8.0 with the new charset and collation settings.
If you have thoughts on this, or have experience running Open edX on MySQL 8.0, I’d love to hear from you.
Thanks folks.