Incomplete CSV Response Export for library_content Blocks in Instructor Dashboard

Hellooo everyone! :blush:
I’m Kevyn from eduNEXT,

I was working with the CSV reports and I found this bug that I would like to share with all of you to find out what we can do:

Summary

When using the Instructor Dashboard to export student responses via the “Create a report of problem responses” button and selecting a block of type library_content (i.e., a legacy content library), the resulting CSV file contains incomplete data. Specifically, it omits the majority of actual student responses submitted to the randomized child problems served by that block.

This issue has been confirmed by comparing:

  • The default report generated via the UI (selecting the library_content block directly).

  • A manual report generated with a workaround that resolves and submits the individual children of the library_content block.

:stop_sign: Note: This issue applies only to Legacy Content Libraries using library_content blocks, not to Content Libraries v2.

Steps to Reproduce

  1. Create a legacy content library, and insert it in a course unit using a library_content block (randomized).

  2. Publish it with multiple problem components.

  3. Allow students to submit responses to the randomized quiz.

  4. In the Instructor Dashboard, select the library_content block.

  5. Click “Create a report of problem responses”.

Compare the resulting CSV to a report generated by resolving and submitting the children of that block.

Workaround (Python Script)

This workaround resolves and submits each child problem inside the legacy library_content block and returns complete results.
from xmodule.modulestore.django import modulestore
from opaque_keys.edx.keys import UsageKey, CourseKey

MODULESTORE = modulestore()
component_key = UsageKey.from_string(component_location).map_into_course(course_key)
component = MODULESTORE.get_item(component_key)
component_keys = [str(locator) for locator in getattr(component, "children", [])]

submit_calculate_problem_responses_csv(
    create_task_request("admin"), 
    course_key, 
    problem_locations=",".join(component_keys)
)

Supporting Data

  • Test conducted with 101 students enrolled.

  • 94 students actually responded to the randomized quiz.

  • The default UI report returned only 28 total response records.

  • The workaround report (resolving children) returned 188 total records.

:right_arrow: That’s a discrepancy of 160 missing responses, meaning the Instructor Dashboard report omitted 85.1% of valid student answers.

Additionally:

  • All missing records had block_keys corresponding to children of the library_content block.

  • Users who did not answer the quiz only appeared in the UI report and were absent from the workaround (expected behavior).


Root Cause

Incorrect handling of library_content blocks

The submit_calculate_problem_responses_csv task, when triggered via the Instructor Dashboard, treats the library_content block as a direct assessment node and passes it as-is to the report generator.

Randomized children not resolved

However, library_content is a container for randomized child problems. Student responses are recorded against those children, not the parent block. Failing to resolve them results in an incomplete report.

User-dependent block visibility

The _build_problem_list function (in lms/djangoapps/instructor_task/tasks_helper/grades.py) uses modulestore queries that filter blocks based on the visibility of the user generating the report. This leads to incomplete problem listings in cases where different students see different randomized blocks — such as in legacy content libraries.


Impact

  • Instructors receive incomplete and misleading reports for legacy content libraries.

  • Assessment and grading accuracy is compromised.

  • Additional manual intervention is required to retrieve correct data.

  • Can undermine trust in platform analytics or auditing processes.


Proposed Solution

High-Level Fix

To resolve the issue and ensure instructors receive complete reports for legacy library_content blocks:

  1. Detects when the selected problem_location is of type library_content.
  2. Automatically resolve and include all its child problems using the modulestore.
  3. Generate the report using the resolved child problem block IDs.
  4. Ensure that only instructors or admins can trigger the report to protect student privacy.

Implementation Notes

An investigation identified that the issue also stems from how the _build_problem_list function in:

edx-platform/lms/djangoapps/instructor_task/tasks_helper/grades.py

retrieves problem blocks. This function applies visibility filters based on the requesting user, which causes it to exclude randomized or personalized problems not visible to that user.

To fix this:

  • Modify _build_problem_list to traverse the course structure using modulestore APIs that do not apply user visibility filters, so that all potential problems including those served to different students are included in the report.
  • Continue enforcing access control at the task level to ensure only authorized roles (instructor/admin) can run reports that expose student responses.
2 Likes

Thanks for the thorough report, @efortish! I’d like to understand better the impact of those changes on the visibility of randomized blocks in the reports and on the other parts of the platform. What else could be impacted by this change? Also, can we generate a sample report with the change you propose?

1 Like

Hello @mgmdi , hope you are doing great!
The proposed change specifically improves the accuracy of instructor reports for legacy library_content blocks by ensuring that all randomized child problems—regardless of which ones were served to each student—are included in the exported CSV, rather than just the parent block or those visible to the instructor at report time. This adjustment does not affect student experience, grading, or analytics elsewhere on the platform, as it only alters how the instructor report task resolves and collects problem blocks for reporting purposes. No other platform features are impacted, and access control remains enforced so only instructors or admins can generate these comprehensive reports.
I’ll attach 2 samples, 1 before and 1 after the change I propose.

Here are the samples, I used 99 users * 5 questions each, total: 496 (including the labeling row)
This is before:


And this is after: