Report generation is taking too much time

Hi everyone! As I don’t know where to ask this, I’ll put it here, in development.

Our partners are generating student_state_from_block reports to get all the answers that students gave in a specific exercise. One of the main differences of this report on theses courses is that we have a custom Python evaluated problem that we created that is being extracted as well.

Currently the extraction of these reports is taking way longer than expected, hitting the DB connection timeout we’ve set. I’m just wondering if, for response extraction purposes (to get the value of the input field), all these python fields are being evaluated and adding to the process time. Can somebody share with me some insights about this?

@dave is on vacation currently, but when he’s back, he might be able to give an answer!

Awesome. I’ll wait for @dave to be fully energized :slight_smile:
Thank you @sarina.

Hi @sandro! A couple of questions:

  1. What release are you running?
  2. Can you please post a screenshot of the LMS button that’s pushed, just to make sure I’m thinking of the correct report?

I suspect what’s going on is this bit:

That allows any XBlock to implement a generate_report_data() method in order to give better formatted responses for this report. ProblemBlock implements this:

So I think you’re right–the ProblemBlock is probably doing all that Python sandbox setup and calculation, slowing things down. It does look like the code takes some pain to avoid fully instantiating things in order to save time and memory:

It’s possible that this either wasn’t sufficient, or that there was some performance regression that happened later on that wasn’t caught. I’m afraid that’s about as far as I can get without setting up test data and doing profiling. I hope that helps narrow things down though.

Broadly speaking, I think some options are:

  1. See if there’s space to optimize this further, after inspecting it with a profiler.
  2. Offer a non-formatted version that just always pull raw state data.

That being said, any change to the CSV output would be a breaking change. We’d either need to create a new report entirely, or make it an opt-in flag, and that would likely get confusing for users.

It’s also possible that this data can be better accessed through Aspects now. On that, I defer to @TyHob and @Sara_Burns.

Hey @dave.
We’re using the following form to extract the problem responses:

We are currently running Redwood and migrating to teak later this year, hopefully.

These are the results from our analysis in the past week:

  • Every single python evaluated problem went to our codejail container to be evaluated.
  • This created an issue with our only codejail container being overwhelmed with requests.
  • Due to the codejail issue, the connection to MySQL was timing out constantly, as codejail was taking too much time to complete tasks while the connection was open.

With this, I have 2 questions for you.

First, why are these CSV report tasks not running in the lms-worker-highmem, as they tend to be quite intensive when a course has a lot of enrollments?

Second, couldn’t we run these tasks using the read_replica?

With theses questions we are just trying to extract some extra performance from the system, as we think that this can be a bit optimized without breaking the code. We are interested in opening PRs, if the two questions I ask make sense to you.

By now we have scaled the number codejail pods to meet the demand and adjusted the timeout values just to make sure it has all it needs to evaluate the massive number of reports we have. We are currently monitoring everything.

I also like this idea, as most of the staff of a course only want to see the response that was given when a user submits one via a text input field. I don’t think we should be evaluating everything for these type of reports.

I don’t know for sure, but my guess was that there’s no deep reason for it. Things tended to get moved over to the highmem workers in reaction to observed operational issues. It’s possible that this report just never rose to that level of attention on edx.org.

I’m not sure. Intuitively, we only need to read the data. But given how this kind of code has historically worked, it’s very possible that the simple act of figuring out whether or not the answer is correct will have the side-effect of rewriting state. (This was an intentional feature once upon a time, even if it’s much less useful these days.)