Extra history record stored on each problem submission

Each time a user submits a problem answer, two records are getting stored in the Courseware Student Module History table. This appears to be happening because the StudentModule record’s fields are updated in two separate saves, each of which triggers a post_save event that inserts a new history record.

First capa_module.ProblemBlock.submit_problem publishes a grade event, which ultimately updates the StudentModule instance in courseware.model_data.set_score. At this time only the grade and max_grade fields are updated, which causes the StudentModule instance to be internally-inconsistent, with the state field still reflecting the data from the last submission, but the grade field reflecting the status of the new submssion.

Then the submit_problem function completes, and the standard xblock.runtime.Runtime.handle function saves the instance again. This time all of the StudentModule’s fields are updated to match ends in the correct state.

However, in the process the student module history table collects extra records. For example, in this submission history, both submissions 13 and 14 show attempts: 4. Note that in submission 14, the header score is 1.0 / 1.0 but in the json score.raw_earned still shows 0 and the correct_map still says incorrect. This is because the state is leftover unchanged from the early submission 13. Only when submission 15 is saved one second later does the header score match the score.raw_earned.

#15: 2022-08-12 19:27:32 UTC

Score: 1.0 / 1.0
{
  "attempts": 5,
  "correct_map": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {
      "answervariable": null,
      "correctness": "correct",
      "hint": "",
      "hintmode": null,
      "msg": "",
      "npoints": null,
      "queuestate": null
    }
  },
  "done": true,
  "input_state": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {}
  },
  "last_submission_time": "2022-08-12T19:27:31Z",
  "score": {
    "raw_earned": 1,
    "raw_possible": 1
  },
  "seed": 1,
  "student_answers": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": [
      "choice_3"
    ]
  }
}


#14: 2022-08-12 19:27:31 UTC

Score: 1.0 / 1.0
{
  "attempts": 4,
  "correct_map": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {
      "answervariable": null,
      "correctness": "incorrect",
      "hint": "",
      "hintmode": null,
      "msg": "",
      "npoints": null,
      "queuestate": null
    }
  },
  "done": true,
  "input_state": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {}
  },
  "last_submission_time": "2022-08-12T19:27:20Z",
  "score": {
    "raw_earned": 0,
    "raw_possible": 1
  },
  "seed": 1,
  "student_answers": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": [
      "choice_0",
      "choice_1"
    ]
  }
}

    
#13: 2022-08-12 19:27:22 UTC

Score: 0.0 / 1.0
{
  "attempts": 4,
  "correct_map": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {
      "answervariable": null,
      "correctness": "incorrect",
      "hint": "",
      "hintmode": null,
      "msg": "",
      "npoints": null,
      "queuestate": null
    }
  },
  "done": true,
  "input_state": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {}
  },
  "last_submission_time": "2022-08-12T19:27:20Z",
  "score": {
    "raw_earned": 0,
    "raw_possible": 1
  },
  "seed": 1,
  "student_answers": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": [
      "choice_0",
      "choice_1"
    ]
  }
}

Do you have advice for how to avoid collecting extra noise and uphold the credibility of our learner performance dashboards?

Thank you!

1 Like

I agree with your analysis on what’s going on and why. I added StudentModuleHistory during an early hackathon as a kludge to help the Intro to Python course team debug their student answers. So it does the dumbest thing possible and just listens for the save signal. It’s one of those expedient hacks that never quite made it to the point of becoming a “real” feature, while being just useful enough and small enough to avoid getting killed off. It remains a debug tool, and was never part of the researcher export like StudentModule was.

Some interesting edge cases:

  • State is often updated with no update to the score.
  • Score is sometimes updated with no update to student state (e.g. running a rescoring job after a bug was fixed in ProblemBlock).
  • Changes to StudentModule can take place in either web transactions or celery tasks (so we can’t rely on there being a request_finished signal.
  • State and score are sometimes out of sync for a period of time because the problem is being submitted to a remote grader via XQueue and the grader may take seconds or minutes to reply with the resulting score change. This was the original motivating debug scenario.

It would be nice if StudentModule student state and scoring info changed in a lock-step fashion, though the remote grader involvement means that it won’t always be that way. But the XBlock state management and scoring are different systems, and I worry about unintended side-effects and coupling introduced by any code that tries to make sure save is only called once.

I think we might be able to do something less risky by being more careful in the listener code. Instead of immediately creating a new StudentModuleHistory entry every time, we could:

  1. Check a request cache variable to see if this StudentModule id has already been modified. (I think this cache will work over the course of a celery task.)
  2. If it’s not in the cache, create the new StudentModuleHistory row and store it’s id as the value in the cache–so the cache values are a mapping of StudentModule IDs to the corresponding StudentModuleHistory IDs created for them.
  3. If we do find it in the cache, update that StudentModuleHistory row instead of creating a new one.

One warning: I mention storing id values in the cache instead of StudentModuleHistory objects themselves because certain grading tasks will make potentially millions of StudentModule updates and it will likely run out of memory if we start putting whole model objects in there.

Such a change would need to be well-messaged though, both here and in the release notes. It’s an obscure feature, and it’s arguably broken. But if people do rely on it, they should have a chance to weigh in on how they use it.

Given the hacky existing state, it might be worth considering a new implementation entirely.

Good luck!

2 Likes

Hi @dave. I have an almost working version of this RequestCache implementation to avoid duplicate StudentModuleHistory records. However, I have encountered an obstacle which may have broader implications than just this particular implementation. Either way, I wonder if you have insights or guidance to help me.

This call to recalculate_subsection_grade_v3.apply_async entirely clears the DEFAULT_REQUEST_CACHE. Below I have included an example list of keys that are present in the cache before this line of code. The cache is entirely blank afterward.

For me, this causes the next save call to think it is the first attempt at saving StudentModule and therefore creates a new StudentModuleHistory record as before; making my attempt at a fix useless. I wonder what unintended impact this cache reset has on other functionality in the request lifecycle.

Do you have thoughts on how to proceed?

    'configuration/SplashConfig/current', 
    'configuration/DarkLangConfig/current', 
    'edx_django_utils.cache.should_force_cache_miss', 
    'edx-drf-extensions.authenticated_user_found_in_middleware', 
    'course_cache', 
    'common.djangoapps.xblock_django.models.XBlockConfiguration.common.djangoapps.xblock_django.api.disabled_xblocks', 
    'disabled_xblock_types', 
    'lms.djangoapps.courseware.modulestore_field_overrides.    enabled_providers.course-v1:edX+DemoX+Demo_Course+branch@published-branch+version@630d29cdca593834bd226487', 
    'configuration/CrawlersConfig/current', 
    'lms.djangoapps.courseware.modulestore_field_overrides.    enabled_providers.course-v1:edX+DemoX+Demo_Course', 
    "openedx.core.djangoapps.waffle_utils.models.override_value.<class 'openedx.core.djangoapps.waffle_utils.models.WaffleFlagCourseOverrideModel'>.course_experience.relative_dates.course-v1:edX+DemoX+Demo_Course", 
    "openedx.core.djangoapps.waffle_utils.models.override_value.<class 'openedx.core.djangoapps.waffle_utils.models.WaffleFlagOrgOverrideModel'>.course_experience.relative_dates.edX", 
    'edx-when.content_dates:course-v1:edX+DemoX+Demo_Course:policy__rel_date,::', 
    'course_dates.course-v1:edX+DemoX+Demo_Course.4.schedule-2022-06-10 21:08:27.776552+00:00', 
    'lms.djangoapps.courseware.field_overrides.enabled_providers.course-v1:edX+DemoX+Demo_Course', 
    'openedx.core.djangoapps.user_api.models.get_all_preferences.jcohen', 
    'openedx.core.djangoapps.theming.helpers.get_template_path.optioninput.html', 
    'common.djangoapps.edxmako.paths.lookup_template.main.optioninput.html', 
    'lms.djangoapps.courseware.models.student_module_history_map'

I encountered a similar strange grading issue.

See this screenshot, for a student, individual problem score shows the student earned 64 out of 80
But total shows 44 out of 80!

Sorry for the silly question here, but do you know why recalculate_subsection_grade_v3 is modifying StudentModule state? I thought it only triggered after a PROBLEM_WEIGHTED_SCORE_CHANGED event is received (so after the StudentModule has already changed).

Or is it because the dev environment spins up celery in-process instead of as as separate worker? Are you running development using devstack or tutor?

I do not believe recalculate_subsection_grade_v3 is directly modifying the StudentModule state. However it is running as part of the call stack that does modify the state. Here is the flow:

- capa_module.ProblemBlock.submit_problem
  - capa_module.ProblemBlock.publish_grade 
    - xblock.runtime.runtime.XBlockRuntime.handle_grade_event
      - grades.signals.handlers.score_published_handler
        - courseware.model_data.set_score (this is the first time the StudentModule is modified to set the grade. the id is successfully added to the RequestCache)
        - grades.signals.handlers.problem_raw_score_changed_handler
          - grades.signals.handlers.enqueue_subsection_update
            - recalculate_subsection_grade_v3.apply_async (this is where the RequestCache gets wiped clean)

then the request continues getting processed and before returning we end up in:

- courseware.user_state_client.DjangoXBlockUserStateClient.set_many (this is the second time the StateModule is saved to set the state. the RequestCache is expected to contain the id and update the existing history record)

The recalculate_subsection_grade_v3 task runs and clears the RequestCache in between the two times the StudentModule is saved. Unfortunately, by the time the request flow attempts to set the state, the RequestCache has already been cleared. As a result, a new history record gets created instead of updating the existing one.

This is a demonstration of the unexpected impact of clearing the RequestCache. I am not familiar with what other functionality relies on the RequestCache, so I can only speculate that there may be other (potentially severe) dependencies that are failing to function predictably due to the cleared RequestCache.

So this specific thing wouldn’t affect a production environment because the celery task should be run in a different process entirely–it only runs in-process in development mode. But I take your larger point that this means that anything that resets the cache could cause trouble.

But you don’t have to use the DEFAULT_REQUEST_CACHE. You can instantiate a RequestCache with your own namespace like:

Can you see if that fixes your problem?

Just following up on this in case I’m misunderstanding… can you please point me to the code that does the resetting of the request cache during recalculate_subsection_grade_v3?

Hi @dave thanks for following up. Sorry for my delayed response as I have been traveling. I never found the line of code that cleared the request cache. The way I diagnosed was using print statements (haven’t yet figured out how to use debugger breakpoints in edx-platform with tutor) to narrow down the offending code. I printed the request cache before and after this line highlighted that request cache was populated before and empty after.

No worries. I hope that instantiating a custom named RequestCache does the trick for you. Good luck!

@dave Sorry again for my delayed follow up. I finally got around to testing your proposed change, but unfortunately the recalculate_subsection_grade_v3 task clears out all namespaces in the RequestCache. You can see my code changes here (including some troubleshooting logs that are not intended to be merged).

Since the _recalculate_subsection_grade code doesn’t appear to touch the RequestCache, I imagine this might be a side effect of the celery task. Are you more familiar with the celery task architecture to help troubleshoot?

As you mentioned, setting CELERY_ALWAYS_EAGER = True in dev is forcing the task to get executed locally rather than pushed to a queue. I imagine, that Celery then has task cleanup code which is clearing the thread context. Do you know the history of why we are using CELERY_ALWAYS_EAGER instead of using a separate worker in dev?

https://docs.celeryq.dev/en/3.1/configuration.html#celery-always-eager

@dave Thankfully I was able to track down the cause and solve the case of the disappearing RequestCache. If you’re interested you can see my proposed fix in this commit. There’s a method edx-platform.openedx.core.djangoapps.util.signals._clear_request_cache which clears the request cache upon completion of every celery task. Adding a check to that method to avoid clearing the cache when the tasks are run eagerly (i.e. sharing a thread with the django request) fixes the issue of the disappearing RequestCache.

Now that the RequestCache is reliable, the duplicate studentmodulehistory records are also fixed by the implementation you proposed. Thanks for the guidance and help!

@Jeff_Cohen: That’s a great find! And also a good reminder about how signal handlers like this can cause spooky-action-at-a-distance issues. :smile:

I think it’s just because it was simpler. For a very long time, devstack didn’t run celery processes at all. Also, if you have a bug in your celery task, it’s easier to run it all sequentially in process during development so that any exceptions thrown by your task just show up as a browser 500 error while executing the view–you don’t have to worry about reloading celery tasks, or connecting to them for debugging.

That looks great. Based on the comments in that method, it looks like the primary purpose of clearing the request cache after every celery task was to guard against memory leaks. I was worried that there was some correctness issue, which would have been much trickier to deal with. I agree with your approach for disabling it specifically when it’s being run eagerly. Memory leaks are less of an issue in dev (where the process is getting reloaded all the time). And even if it were an issue, the fact that it’s operating in the request/response cycle means that the middleware will reset the request cache anyhow.

Thank you for digging into this and finding the fix! Please just be sure to add comments explaining exactly what the thinking behind your commit was, because I’m guessing it will confuse the next person that looks at it four years from now. :smile:

Just as a quick followup, @Jeff_Cohen authored and merged these fixes:

Congratulations on your first merged Open edX PRs!