Does anyone know why a (relatively simple) special character like á in Sánchez would be breaking and showing up as “Sánchez” in a course completion certificate?
Could this have to do with my utf8mb4 trials and tribulations and perhaps data got corrupted?
Definitely looks like an encoding issue, but not necessarily data corruption (depending on where it’s showing up). I really don’t know much about what tables we store this in today (I hope someone else will be able to provide that), but I do know that “á” is what gets spit out if you take an “á” that is stored as UTF-8 but then interpret those bytes as Latin-1:
>>> "á".encode('utf8').decode('latin1')
'á'
This is a character that should be encoded the same way whether we’re talking utf8mb3 or utf8mb4, which is where all the pain around broken emojis and the recent conversion work in Tutor is around.
Thanks for the pointer. I don’t know what happened, but the change to the character was actually present back in the student’s record itself. So I fixed it by just editing their name in their account. But I think this probably means it has broken for others, and they just haven’t noticed or mentioned it yet. But I don’t see any good way to deal with it other than on a case-by-case-basis. :-/