Checking PII annotations with a lower coverage threshold

In 2019, we resolved to annotate all instances of PII (personally identifiable information) in our data models, plus information on how & why that PII is stored and whether & how it is retired. The purpose of these annotations are both to hold developers accountable to properly handle PII and to enable Open edX operators to audit their own PII handling.

In the edx-platform repository, we (ostensibly) used CI to ensure that these annotations were being added to all Django models. We originally set the threshold that 94.5% of all models in edx-platform and its dependencies needed to be annotated.

At some point, however, this CI check was disabled. At some later point, the check was re-enabled, but it was not actually wired up correctly and has been raising a false-positive all this time. So, we have not been enforcing the 94.5% threashold for a while, and as you can guess, the % of annotated Django models has dropped over time.

As it stands today, 71.6% of edx-platform Django models are annotated. So, the Aximprovements team will fix the PII annotation CI check and adjust it to use 71.6% as the baseline threshold. They do not currently have the resources to fix the violations that have accumulated, though.

If any contributors are interested in working to add missing violations and raise the threshold back to thtat 90-100% range, feel free to use this thread to coordinate. Otherwise, just consider this a heads-up that many edx-platform models, particularly ones added between Juniper and Sumac, will be missing documentation of how & whether they use PII.

2 Likes

I’m digging into this and writing up a spreadsheet of what the tool reports to be missing. So far it seems like something got broken in the years since anyone has looked at this, so it’s giving false positives on things annotated .. no_pii:. That’s good news, since a number of things I’m finding so far are correctly annotated. Once I’ve made some progress on the spreadsheet I’ll shift gears to fixing the bug.

It’s also worth noting that what I’m looking into is a pretty vanilla LMS install, and if folks are running with additional plugins / packages etc. they will have more surface area than what I will be able to cover.

1 Like

@bmedx was able to get the PII annotation threshold up from 76% to 88%! chore: Add missing PII annotations, update safelist by bmtcril · Pull Request #35779 · openedx/edx-platform · GitHub

In addition to raising the PII annotation coverage close to its old value, are also looking to fix the remaining PII annotation linting errors so that we can once again begin enforcing that annotations are always well-formed. I believe this PR is the only blocker to that: build: Assume that SoftwareSecure PII is retained (to fix annotations) by kdmccormick · Pull Request #1247 · openedx/edx-proctoring · GitHub