Adding index on BlockCompletion model

We’d like to add an index on the completion_blockcompletion table to improve performance when filtering by the modified column. This leads to a few questions:

  1. Is there a deliberate reason this index doesn’t already exist? Are there risks in adding it?
  2. If it is fine to add, then what is the best approach? Add the index to the model definition in a fork of the completion repo? Create a custom migration that manually adds the index in my own custom plugin?

@dave are you familiar with this area of the code?

There is already an index on ('user', 'context_key', 'modified'). Are you trying to find/filter the latest modified blocks across users or across courses?

I suspect nobody else has had this issue yet because most usage patterns are single user within a single course (“context”), so the current index works. Or perhaps it is a bottleneck affecting others and you’re the first to identify it.

If an additional or replacement index makes sense and if we can determine that it isn’t going to cause performance issues for large installations like 2U’s, let’s just add the index to the completion app for everyone.

Thanks for the reply @braden. My particular use case at the moment is related to incrementally syncing this dataset to a data warehouse for reporting. The intention is to filter where modified > last_job_runtime. For example, a query for today’s completion activity takes almost 30s just to return a count of records.

mysql> select count(1) from completion_blockcompletion where modified > '2023-10-26 00:00:00';
| count(1) |
|     7137 |
1 row in set (29.58 sec)

Got it. That makes sense to me.

Thanks for confirming. In that case what would be the appropriate way to verify whether this new index would cause issues for instances with bigger datasets?

That’s more of a @dave question, or perhaps someone from 2U can answer.

Sorry, I forgot to reply to this thread. My understanding is that creating an index for an existing field causes locking on regular MySQL 5.7, but does not cause locking on AWS’s Aurora (which is what runs). I don’t know what the behavior for non-Aurora MySQL 8.0 is.

This is the closest example I can find where we’ve done this before was the big Lilac courseware_studentmodule index creation:

Update, based on a couple of runs we’ve done:

Two MySQL databases we ran this on both went at about the rate of 2 million rows per minute.

An Aurora instance on AWS RDS completed this without locking the table. Write latency was slightly elevated, but it did not impact users on the site.

A non-Aurora instance on AWS did lock the table, and downtime was required to run this.

I think you should discuss merging any such PR with the #cc-edx-platform channel in Slack. My understanding is that 2U is in the middle of some fairly involved database upgrade work at the moment, and they may request you to hold off until that has completed. I definitely think that it’s a PR worth making though, even if it takes a little longer to get merged.

FYI to @jmbowman and @adzuci on the 2U side.

1 Like