Adding a grades-per-problem API endpoint

Hi all,

I’m new to Open edX, and I’d love to hear your opinion about a feature our company requires, that we’re considering to propose to the Open edX codebase.
I would like to know if you think other parties might be interested in such a feature, and whether I should start deep-diving into the code and preparing the pull request process, or if the idea is doomed to begin with!

Our product consumes information about courses, students and grades from an Open edX company/app, and the only thing missing from the built-in API is the ability to get student grades for each problem block.
Currently, the lowest possible level is grades for a vertical, but we need this information at the problem level.
What we require is the raw score, coupled with the question’s max possible score (it would be best to get the max possible score from a separate API, but we can settle for this approach as well:).
The score should be included with the problem’s block ID and the responding user’s username, because that’s the best identifier we can get from the rest of the API (no user IDs are sent in any form).

Side note: I’m not sure if the username issue is specific to our “provider” or not, but if so - then we would want to have the ability to specify to the API what info about the user we want to get (in this case - username).

The endpoint should receive a course (or ccx) ID, and return the grades for all the problems and for each student.
Optionally it can receive a student ID/username/email and return only their grades.

Getting the actual user response itself (i.e. the content of the answer) will be a blessed bonus, but not a must.

Assume that the mentioned company is reluctant to develop a new endpoint just for us, because of future maintenance that will probably be required by them, and they object enabling xAPI solely for this purpose.

And even if they agreed to go with the xAPI - this can be problematic. Assuming we’re okay with having to maintain data retrieval from 2 vastly different sources, it will be problematic in edge cases where for some reason, perhaps a bug, a response/evaluated event didn’t get to the LMS, or alternatively we didn’t get it on time before it expired. In such cases, our data will be partial, and being a dashboard app - that’s the last thing we want. We need to represent the state of a course exactly as it is/was, so the users can make educated decisions.
We’ll also have no way of knowing that an event is missing…

What are your thoughts?
I’ve already encountered someone in the forums that asked for the same thing…

Thanks a lot in advance!
Dor.

2 Likes

Disclaimer: It’s been a very long time since I was involved in grading related code, and I’m not sure what APIs are out there now.

The biggest issue that occurs when making something like this is that getting all the scores for a student in a course can be a very slow process. Not because it’s hard to get those rows from the database (there are only a couple of models where this data is stored, even if the StudentModule model isn’t indexed quite right for these queries). The slow part is generally figuring out what permutation of the course any particular user sees, and what the possible scores are for the many problems that they haven’t answered. This overhead gets worse as the size of the course content grows, and may even take 3-5 seconds for an individual user on some of the largest courses (say with hundreds of problems).

That overhead makes it hard to make an API that can be reliably used for more than one student at a time, and is why that data is mostly generated asynchronously into CSV files. I have this very vague recollection that MIT created a separate grades API a long time ago that does what you’re looking for, but I can’t seem to find it at the moment (@pdpinch: do you remember this, by any chance)?

The progress API (/api/course_home/progress/{course_id}) is the closest thing that I know of to what you want today. It does take an optional student_id as an additional part of the URL, which can be used if the requesting user has course staff permissions. It seems a bit unusual in that the parameter is expected to be a student_id (integer) instead of a username like most other edx-platform APIs that provide that kind of functionality.

The main things that the progress API would be missing is (a) the ability to specify student by username instead of student_id; and (b) the block usage keys to accompany the problem_scores entries which currently just have a list of earned/possible that looks like:

"problem_scores": [
    {
        "earned": 0,
        "possible": 5
    },
    {
        "earned": 0,
        "possible": 2
    },
]

That being said, this would only be useful for one learner at a time. For doing multiple students at a time, you’d probably want some sort of asynchronously triggered report generation (you could look at the instructor downloads to see which one is closest). You could make a job that uses django-user-task to help you generate one of those.

If you are looking to feed another system with near-realtime updates, you could also key off of one of the low level score events. If you need more metadata than those events offer, you could hook into the part of the code where grades are recalculated, which happens after every score change, if persistent grades are enabled.

I realize I threw a lot of stuff out there. I hope some of it is helpful. Good luck with your investigation!

1 Like

Hi David,

Thanks so much for the info, and sorry for the late response!

I knew that the instructor can hide/show certain things, but I thought it affected all learners. I didn’t know they can create different permutations for different users.
But why would we want to figure out the permutation each user sees? I was thinking about simply returning whatever data is in the table as-is, and let the API clients make their interpretations of it.

Regarding the possible score for problems they haven’t answered - first, what if we don’t return a row for those problems and that learner? Again, this can be off-loaded and interpreted externally.
Second, I’m not sure how the grades data is stored; can every learner have a different possible score for the same problem? Are the possible scores for problems stored in a separate table (without the grades)?
Either way, couldn’t we query all the possible scores for each problem separately and add that to the response, instead of attaching a “possible” for each score?

Regarding the Progress API - what would be the implications of adding the problem block to each score? Also, why would it be useful for one learner at a time? The issue would again be on the performance side?

I don’t very much like the signals option, because from what I could gather it’s very similar to xAPI in the sense that it’s event-based, and is therefore less reliable and harder to implement and maintain in comparison to querying a table… also I’m not sure the other company would be easily convinced to implement it… but good to know this functionality exists!

We have thought about asynchronous triggered report generation, that’s actually what we have today, only with a lot of missing data regarding the course structure, that’s the main reason we’re want to use the API. But if all else fails, this seems like a reasonable hybrid approach.

Last thing, I saw a lecture from MIT about how they get/deliver grades via API, and from what I understand they used some old leftover API from an older version of Open edX, and I’m pretty sure they haven’t ported it to the updated one, but maybe I’ll try investigating that direction a bit more.

Thanks again for the detailed response, you really helped!

1 Like

I only have time for a really short response tonight, but I will respond later this week. In the meanwhile, please check out the really great documentation that the Aurora team wrote up for grading.

Edit: Apologies. Tomorrow I need to look into why that page is not publicly available. :frowning_face:

1 Like

In the meanwhile, please check out these docs, which are definitely open: