Palm upgrade, running RDBMS in Tutor, MySQL vs. MariaDB, utf8 vs. utf8mb4

Hi everyone,

apologies for the hodgepodge of a subject; I am trying to wrap my head around a few things as I write this. :slight_smile:

We are currently in the process of planning our transition to Palm. All our production platforms are upgrades that were originally installed multiple releases back; we are not launching any new ones on Palm. We run with tutor k8s, with RUN_MYSQL: true, meaning our RDBMS instances run in a container in the same Kubernetes cluster and namespace as the rest of each Open edX platform.

For certain reasons we have always run with MariaDB 10.4, which is a MariaDB release that is binary-compatible with MySQL 5.7. With Palm, Tutor moves from MySQL 5.7 to MySQL 8, which means we are now considering to migrate to Oracle MySQL, but it doesnā€™t seem like thatā€™s the worst of our worries right now ā€” instead, itā€™s the transition from the utf8 character set to utf8mb4.

@dave discussed this in a post last year:

And then much more recently, the encoding transition appears to have caused some major issues for people who upgraded from Olive to Palm, as evident by this emergency fix that @regis put into Tutor, and which precipitated the Tutor 16.1.0 release:

Iā€™m having a bit of difficulty reconstructing what happened around this issue in the interim (that is, between July 2022 and August 2023), since the release notes for Olive and Palm are both silent about MySQL encoding considerations. (I also checked the Nutmeg ones for good measure.)

What I can do is run the following query against one of our Olive databases:

SELECT COUNT(CONCAT(TABLE_SCHEMA, '.', TABLE_NAME, '.', COLUMN_NAME)) AS column_count,
       CHARACTER_SET_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA='openedx'
GROUP BY CHARACTER_SET_NAME;

ā€¦ and that returns:

+--------------+--------------------+
| column_count | CHARACTER_SET_NAME |
+--------------+--------------------+
|         2555 | NULL               |
|         1559 | utf8               |
|            7 | utf8mb4            |
+--------------+--------------------+

So for some reason, there are already some utf8mb4 columns mixed in between the lot of utf8 (aka utf8mb3) ones. (I wonder why, by the way. @jmbowmanā€™s comment on the aforementioned Tutor PR sounds to me like this is unexpected.)

So, whatā€™s the way forward in this situation? Ultimately I believe we want everything to be utf8mb4, but the discussion on Tutor PR 890 seems to indicate that this is far from trivial. What Tutor does now (as of 16.1.0) is set --character-set-server=utf8mb3 --collation-server=utf8mb3_general_ci when invoking mysqld, and Iā€™m not sure what effect that has on data that lives in columns that are already set to utf8mb4.

Perhaps someone can shed some light on this. :slight_smile: I guess the extremely condensed version of my question is this:

If one upgrades

  • from Open edX Olive running on MySQL 5.7 or MariaDB 10.4
  • to Open edX Palm running on MySQL 8.1 with --character-set-server=utf8mb3 --collation-server=utf8mb3_general_ci,

then

  • is there any manual in-place data conversion that needs to be done, and
  • are any problems expected to arise from the fact that an existing database might already contain utf8mb4-charset columns?

Thanks in advance!
Cheers,
Florian

I can chime in on this part. There are a few apps that explicitly set their tables to be utf8mb4 because the thinking was:

  1. It might take a long time for us to switch everyone over to utf8mb4 because it will require some amount of downtime.
  2. Some apps need this support more than othersā€“e.g. itā€™s probably okay if your course title isnā€™t allowed to have an emoji or more obscure mathematical symbols, but itā€™s not okay for course content to have that restriction.
  3. Even if we canā€™t actually store utf8mb4 data in these tables yet (because the connection is utf8mb3), weā€™ll at least have the columns for these new apps in the correct format so that we can avoid an expensive future migration that requires downtime when the time comes to switch over to a utf8mb4 connection.
  4. uf8mb3 is a subset of utf8mb4, and the data written into utf8mb4 columns through a utf8mb3 connection will still be valid (even if emojis couldnā€™t be written because theyā€™re not supported in utf8mb3).
  5. We can at some point switch the Django connection to be utf8mb4 which will allow us to write full Unicode to these new columns. We still wonā€™t be able to write full Unicode to utf8mb3 columns that havenā€™t been migrated (because the database will reject the text as invalid), but we werenā€™t able to do that before anyhow, so we donā€™t lose anything.

The GitHub issue you reference definitely casts doubt on this line of reasoning. Iā€™m not entirely clear what happenedā€“tutor #890 seems point towards a table corruption bug introduced in MySQL 8.0.33, that was later patched in MySQL 8.0.34. That makes more sense to me, since I wouldnā€™t expect the database would allow the client to corrupt data by doing a select with the wrong connection settings. What Iā€™m not clear on is whether upgrading the database alone fixes the bug, or if it really manifests anytime a utf8mb4 connection operates on utf8mb3 tables. (I just asked that question at the bottom of the issue).

1 Like

Thanks! This reminds me that I should probably add a few more bits of information for reference:

  • The PR 890 thread explains that the issue identified was most likely fixed in 8.0.34, citing the release notes for that release (specifically, the reference to bug #35410528, but this is a reference to an internal database so we canā€™t use this to dig any further).
  • Rather than updating to MySQL 8.0.34, @regis decided to go directly to MySQL 8.1, because that release ā€œseems to fix a large number of bugsā€.

I wonder if the move to MySQL 8.1 by default wasnā€™t a bit premature, since 8.1 is not currently in General Availability like the 8.0 series, but is rather labeled an Innovation Release ā€” which happens to be a completely new category, so Iā€™m not sure if most people would be all too comfortable running it. 8.1 is also not supported on AWS RDS at this time, which means that Tutor-managed MySQL databases now diverge by default from whatā€™s arguably the most popular Open edX RDBMS option.

What do others think?

This is a complex issue, and Iā€™ve not been to the bottom of it. So Iā€™m not able to answer all your questions yet.

Yes, I saw that, too. The tables come from blockstore.bundles: https://github.com/openedx/blockstore/blob/14b2a844e6c5244980f5a95d2061a505429dba15/blockstore/apps/bundles/models.py#L70

Source:

mysql> select table_schema, table_name, column_name, character_set_name from information_schema.COLUMNS where character_set_name="utf8mb4" and table_schema="openedx";
+--------------+-----------------------+--------------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME            | COLUMN_NAME        | CHARACTER_SET_NAME |
+--------------+-----------------------+--------------------+--------------------+
| openedx      | bundles_bundle        | description        | utf8mb4            |
| openedx      | bundles_bundle        | slug               | utf8mb4            |
| openedx      | bundles_bundle        | title              | utf8mb4            |
| openedx      | bundles_bundleversion | change_description | utf8mb4            |
| openedx      | bundles_collection    | title              | utf8mb4            |
| openedx      | bundles_draft         | name               | utf8mb4            |
+--------------+-----------------------+--------------------+--------------------+
6 rows in set (0.04 sec)

My 2 cents: if you are already able to upgrade to utf8mb4, then by all means please do. I did not want to push a rushed charset change in-between major releases, so I decided to stick with utf8mb3.

No, I donā€™t think so.

If there are some problems, they should already exist in Olive. So I would not worry about it.

This is incorrect. Data corruption would occur even with 8.0.34 or 8.1.0. What matters is the explicit client-side charset definition. We only upgraded to 8.0.34 because it included a series of unrelated bugfixes. And then we decided to go straight to 8.1.0 for the same reason (but that was a mistake, see below).

The latter.

It was. I had not realized that 8.0 was considered LTS. If I had, I would have stuck with 8.0.34. But Iā€™m not too worried about that decision, as 8.1 should be backward-compatible with 8.0.

No problem, and thanks for the comprehensive reply!

Hmmm, how would I be able to tell if Iā€™m ā€œable to upgradeā€? From your comment on the PR thread it looks to me like the only way to test would be switching the server-side settings to --character-set-server=utf8mb4 --collation-server=utf8mb4_general_ci, running a migration, and then seeing if anything breaks ā€” but for the migration to do anything meaningful, it would only make sense to do this on an Olive data dump in the middle of an upgrade to Palm. Is there any way to test this after weā€™re through with the Palm upgrade?

Makes sense, considering with MySQL 5.7/MariaDB 10.4 they would only have been populated with the subset of utf8mb4 thatā€™s equivalent to utf8mb3. Did I understand that correctly?

OK. My concerns from upthread still stand ā€” Iā€™m a little bit hesitant to run with a MySQL version that the majority of Palm users arguably wonā€™t be using[1], so we currently run our Palm tests with DOCKER_IMAGE_MYSQL: docker.io/mysql:8.0.34 and see where that takes us. :slight_smile:


  1. This is on the assumption that more than 50% of learners using a production Open edX instance use one that runs on AWS with RDS. I have no hard data to back this up; itā€™s just a hunch. ā†©ļøŽ

Yes. I donā€™t see why you would have to make the switch in Olive. Instead, I suggest you switch your server and client to utf8mb4 after the Palm upgrade.

Yes.

My assumption is the opposite: I expect that the vast majority of people do not use RDS. Actually, they probably donā€™t use AWS at all.

Right, but my question was aiming at something else: how does one reliably test, after the upgrade to Palm is complete, that nothing broke in the process?

Palm is a named release which was extensively tested. I would suggest you only test your own customizations to check that the upgrade worked.

Yes but is there a means to test that an installation is unaffected by this specific issue, namely the UTF-8 encoding problem (which evidently wasnā€™t detected during Palm release testing)?

As in, if we run an upgrade and all migrations succeed, does that mean that all is definitely well with the database data? Or is there a scenario in which migrations might succeed and data corruption still occurred, and if yes, is there a test for that?

Right, sorry about that, I misunderstood your question.

You will definitely notice the encoding issue very quickly if you face it. Migrations will succeed, but your platform will have many 500 errors. The Python stacktrace will include UnicodeDecodeError exceptions. You will be unable to run mysqldump on your database. The following query will render garbled data:

tutor local do sqlshell openedx
select * from course_overviews_courseoverview;

OK, thanks! ā€œRun mysqldump and if it succeeds weā€™re most probably fineā€ is a test we can work with. :slight_smile:

Important update from the PR thread from @regis. TLDR excerpt:

OK Iā€™ve done some more research and I think Iā€™ve finally found the root cause, which is not at all related to the utf8mb3 charset. Instead the issue is caused by that MySQL bug #35410528.

(snip details)

Thus, the following are all working fixes:

  1. Restarting the mysql 8.0.33 container just prior to applying migrations.
  2. Upgrading to mysql 8.0.34 or 8.1.0 before applying migrations. (in that case restarting the container is unnecessary)

@regis: Thank you for digging into this so thoroughly!