Potentially Very Slow Database Migration About to Merge to edx-platform

I’ve created a PR to add a composite index to courseware_studentmodule in order to improve the performance of certain kinds of reports that need to look at individual problems across many users:

My target merge date is 2021-03-05. Please respond here or on the PR if you have concerns.

On an AWS instance with 40 million rows in it, using RDS but not Aurora, this took about twenty minutes to run, during which it locked the table entirely. For a larger site like edx.org, where we have billions of rows in this table, it will take hours, possibly over a day–though Aurora is supposed to let us do it in a non-locking manner. We’re going to be testing the lock behavior of Aurora while updating one of our instances shortly, and I’ll put an update here before merging the PR.

Those who follow the master branch closely or run migrations in their CI will want to take precautions. We’re taking the approach of adding the index manually (and so outside the usual release process), and then faking the migration. This may also increase the time needed to run the migration to Lilac when that comes out.

There are many more details about how we got here and why these particular columns are being indexed in the pull request summary. My apologies for the inconvenience of this.

2 Likes

Update, based on a couple of runs we’ve done:

Two MySQL databases we ran this on both went at about the rate of 2 million rows per minute.

An Aurora instance on AWS RDS completed this without locking the table. Write latency was slightly elevated, but it did not impact users on the site.

A non-Aurora instance on AWS did lock the table, and downtime was required to run this.

If you are trying to do this without downtime, I highly recommend you spin up a replica and test it out first.

1 Like