Extra history record stored on each problem submission

Jeff_Cohen · August 15, 2022, 10:10pm

Each time a user submits a problem answer, two records are getting stored in the Courseware Student Module History table. This appears to be happening because the StudentModule record’s fields are updated in two separate saves, each of which triggers a post_save event that inserts a new history record.

First capa_module.ProblemBlock.submit_problem publishes a grade event, which ultimately updates the StudentModule instance in courseware.model_data.set_score. At this time only the grade and max_grade fields are updated, which causes the StudentModule instance to be internally-inconsistent, with the state field still reflecting the data from the last submission, but the grade field reflecting the status of the new submssion.

Then the submit_problem function completes, and the standard xblock.runtime.Runtime.handle function saves the instance again. This time all of the StudentModule’s fields are updated to match ends in the correct state.

However, in the process the student module history table collects extra records. For example, in this submission history, both submissions 13 and 14 show attempts: 4. Note that in submission 14, the header score is 1.0 / 1.0 but in the json score.raw_earned still shows 0 and the correct_map still says incorrect. This is because the state is leftover unchanged from the early submission 13. Only when submission 15 is saved one second later does the header score match the score.raw_earned.

#15: 2022-08-12 19:27:32 UTC

Score: 1.0 / 1.0
{
  "attempts": 5,
  "correct_map": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {
      "answervariable": null,
      "correctness": "correct",
      "hint": "",
      "hintmode": null,
      "msg": "",
      "npoints": null,
      "queuestate": null
    }
  },
  "done": true,
  "input_state": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {}
  },
  "last_submission_time": "2022-08-12T19:27:31Z",
  "score": {
    "raw_earned": 1,
    "raw_possible": 1
  },
  "seed": 1,
  "student_answers": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": [
      "choice_3"
    ]
  }
}


#14: 2022-08-12 19:27:31 UTC

Score: 1.0 / 1.0
{
  "attempts": 4,
  "correct_map": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {
      "answervariable": null,
      "correctness": "incorrect",
      "hint": "",
      "hintmode": null,
      "msg": "",
      "npoints": null,
      "queuestate": null
    }
  },
  "done": true,
  "input_state": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {}
  },
  "last_submission_time": "2022-08-12T19:27:20Z",
  "score": {
    "raw_earned": 0,
    "raw_possible": 1
  },
  "seed": 1,
  "student_answers": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": [
      "choice_0",
      "choice_1"
    ]
  }
}

    
#13: 2022-08-12 19:27:22 UTC

Score: 0.0 / 1.0
{
  "attempts": 4,
  "correct_map": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {
      "answervariable": null,
      "correctness": "incorrect",
      "hint": "",
      "hintmode": null,
      "msg": "",
      "npoints": null,
      "queuestate": null
    }
  },
  "done": true,
  "input_state": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": {}
  },
  "last_submission_time": "2022-08-12T19:27:20Z",
  "score": {
    "raw_earned": 0,
    "raw_possible": 1
  },
  "seed": 1,
  "student_answers": {
    "9cee77a606ea4c1aa5440e0ea5d0f618_2_1": [
      "choice_0",
      "choice_1"
    ]
  }
}

Do you have advice for how to avoid collecting extra noise and uphold the credibility of our learner performance dashboards?

Thank you!

dave · August 17, 2022, 3:53am

I agree with your analysis on what’s going on and why. I added StudentModuleHistory during an early hackathon as a kludge to help the Intro to Python course team debug their student answers. So it does the dumbest thing possible and just listens for the save signal. It’s one of those expedient hacks that never quite made it to the point of becoming a “real” feature, while being just useful enough and small enough to avoid getting killed off. It remains a debug tool, and was never part of the researcher export like StudentModule was.

Some interesting edge cases:

State is often updated with no update to the score.
Score is sometimes updated with no update to student state (e.g. running a rescoring job after a bug was fixed in ProblemBlock).
Changes to StudentModule can take place in either web transactions or celery tasks (so we can’t rely on there being a request_finished signal.
State and score are sometimes out of sync for a period of time because the problem is being submitted to a remote grader via XQueue and the grader may take seconds or minutes to reply with the resulting score change. This was the original motivating debug scenario.

It would be nice if StudentModule student state and scoring info changed in a lock-step fashion, though the remote grader involvement means that it won’t always be that way. But the XBlock state management and scoring are different systems, and I worry about unintended side-effects and coupling introduced by any code that tries to make sure save is only called once.

I think we might be able to do something less risky by being more careful in the listener code. Instead of immediately creating a new StudentModuleHistory entry every time, we could:

Check a request cache variable to see if this StudentModule id has already been modified. (I think this cache will work over the course of a celery task.)
If it’s not in the cache, create the new StudentModuleHistory row and store it’s id as the value in the cache–so the cache values are a mapping of StudentModule IDs to the corresponding StudentModuleHistory IDs created for them.
If we do find it in the cache, update that StudentModuleHistory row instead of creating a new one.

One warning: I mention storing id values in the cache instead of StudentModuleHistory objects themselves because certain grading tasks will make potentially millions of StudentModule updates and it will likely run out of memory if we start putting whole model objects in there.

Such a change would need to be well-messaged though, both here and in the release notes. It’s an obscure feature, and it’s arguably broken. But if people do rely on it, they should have a chance to weigh in on how they use it.

Given the hacky existing state, it might be worth considering a new implementation entirely.

Good luck!

Jeff_Cohen · September 8, 2022, 9:26pm

Hi @dave. I have an almost working version of this RequestCache implementation to avoid duplicate StudentModuleHistory records. However, I have encountered an obstacle which may have broader implications than just this particular implementation. Either way, I wonder if you have insights or guidance to help me.

This call to recalculate_subsection_grade_v3.apply_async entirely clears the DEFAULT_REQUEST_CACHE. Below I have included an example list of keys that are present in the cache before this line of code. The cache is entirely blank afterward.

For me, this causes the next save call to think it is the first attempt at saving StudentModule and therefore creates a new StudentModuleHistory record as before; making my attempt at a fix useless. I wonder what unintended impact this cache reset has on other functionality in the request lifecycle.

Do you have thoughts on how to proceed?

    'configuration/SplashConfig/current', 
    'configuration/DarkLangConfig/current', 
    'edx_django_utils.cache.should_force_cache_miss', 
    'edx-drf-extensions.authenticated_user_found_in_middleware', 
    'course_cache', 
    'common.djangoapps.xblock_django.models.XBlockConfiguration.common.djangoapps.xblock_django.api.disabled_xblocks', 
    'disabled_xblock_types', 
    'lms.djangoapps.courseware.modulestore_field_overrides.    enabled_providers.course-v1:edX+DemoX+Demo_Course+branch@published-branch+version@630d29cdca593834bd226487', 
    'configuration/CrawlersConfig/current', 
    'lms.djangoapps.courseware.modulestore_field_overrides.    enabled_providers.course-v1:edX+DemoX+Demo_Course', 
    "openedx.core.djangoapps.waffle_utils.models.override_value.<class 'openedx.core.djangoapps.waffle_utils.models.WaffleFlagCourseOverrideModel'>.course_experience.relative_dates.course-v1:edX+DemoX+Demo_Course", 
    "openedx.core.djangoapps.waffle_utils.models.override_value.<class 'openedx.core.djangoapps.waffle_utils.models.WaffleFlagOrgOverrideModel'>.course_experience.relative_dates.edX", 
    'edx-when.content_dates:course-v1:edX+DemoX+Demo_Course:policy__rel_date,::', 
    'course_dates.course-v1:edX+DemoX+Demo_Course.4.schedule-2022-06-10 21:08:27.776552+00:00', 
    'lms.djangoapps.courseware.field_overrides.enabled_providers.course-v1:edX+DemoX+Demo_Course', 
    'openedx.core.djangoapps.user_api.models.get_all_preferences.jcohen', 
    'openedx.core.djangoapps.theming.helpers.get_template_path.optioninput.html', 
    'common.djangoapps.edxmako.paths.lookup_template.main.optioninput.html', 
    'lms.djangoapps.courseware.models.student_module_history_map'

nadheemabdulla · September 10, 2022, 10:12am

I encountered a similar strange grading issue.

See this screenshot, for a student, individual problem score shows the student earned 64 out of 80
But total shows 44 out of 80!

dave · September 14, 2022, 3:08pm

Sorry for the silly question here, but do you know why recalculate_subsection_grade_v3 is modifying StudentModule state? I thought it only triggered after a PROBLEM_WEIGHTED_SCORE_CHANGED event is received (so after the StudentModule has already changed).

Or is it because the dev environment spins up celery in-process instead of as as separate worker? Are you running development using devstack or tutor?

Jeff_Cohen · September 14, 2022, 4:07pm

I do not believe recalculate_subsection_grade_v3 is directly modifying the StudentModule state. However it is running as part of the call stack that does modify the state. Here is the flow:

- capa_module.ProblemBlock.submit_problem
  - capa_module.ProblemBlock.publish_grade 
    - xblock.runtime.runtime.XBlockRuntime.handle_grade_event
      - grades.signals.handlers.score_published_handler
        - courseware.model_data.set_score (this is the first time the StudentModule is modified to set the grade. the id is successfully added to the RequestCache)
        - grades.signals.handlers.problem_raw_score_changed_handler
          - grades.signals.handlers.enqueue_subsection_update
            - recalculate_subsection_grade_v3.apply_async (this is where the RequestCache gets wiped clean)

then the request continues getting processed and before returning we end up in:

- courseware.user_state_client.DjangoXBlockUserStateClient.set_many (this is the second time the StateModule is saved to set the state. the RequestCache is expected to contain the id and update the existing history record)

The recalculate_subsection_grade_v3 task runs and clears the RequestCache in between the two times the StudentModule is saved. Unfortunately, by the time the request flow attempts to set the state, the RequestCache has already been cleared. As a result, a new history record gets created instead of updating the existing one.

This is a demonstration of the unexpected impact of clearing the RequestCache. I am not familiar with what other functionality relies on the RequestCache, so I can only speculate that there may be other (potentially severe) dependencies that are failing to function predictably due to the cleared RequestCache.

dave · September 15, 2022, 4:13pm

So this specific thing wouldn’t affect a production environment because the celery task should be run in a different process entirely–it only runs in-process in development mode. But I take your larger point that this means that anything that resets the cache could cause trouble.

But you don’t have to use the DEFAULT_REQUEST_CACHE. You can instantiate a RequestCache with your own namespace like:

github.com

openedx/edx-platform/blob/db32ff2cdf678fa8edd12c9da76a76eef0478614/common/djangoapps/edxmako/request_context.py#L41


      
          Returns the template processing context to use for the current request,
          or returns None if there is not a current request.
          """
          
          
if request is None:
              request = get_current_request()
          
          
if request is None:
              return None
          
          
request_cache_dict = RequestCache('edxmako').data
          cache_key = "request_context"
          if cache_key in request_cache_dict:
              return request_cache_dict[cache_key]
          
          
context = RequestContext(request)
          
          
context['is_secure'] = request.is_secure()
          context['site'] = safe_get_host(request)
          
          
request_cache_dict[cache_key] = context

Can you see if that fixes your problem?

dave · September 21, 2022, 2:24am

Just following up on this in case I’m misunderstanding… can you please point me to the code that does the resetting of the request cache during recalculate_subsection_grade_v3?

Jeff_Cohen · September 21, 2022, 3:35pm

Hi @dave thanks for following up. Sorry for my delayed response as I have been traveling. I never found the line of code that cleared the request cache. The way I diagnosed was using print statements (haven’t yet figured out how to use debugger breakpoints in edx-platform with tutor) to narrow down the offending code. I printed the request cache before and after this line highlighted that request cache was populated before and empty after.

dave · September 21, 2022, 4:47pm

No worries. I hope that instantiating a custom named RequestCache does the trick for you. Good luck!

Jeff_Cohen · October 4, 2022, 12:41am

@dave Sorry again for my delayed follow up. I finally got around to testing your proposed change, but unfortunately the recalculate_subsection_grade_v3 task clears out all namespaces in the RequestCache. You can see my code changes here (including some troubleshooting logs that are not intended to be merged).

Since the _recalculate_subsection_grade code doesn’t appear to touch the RequestCache, I imagine this might be a side effect of the celery task. Are you more familiar with the celery task architecture to help troubleshoot?

Jeff_Cohen · October 7, 2022, 9:54pm

As you mentioned, setting CELERY_ALWAYS_EAGER = True in dev is forcing the task to get executed locally rather than pushed to a queue. I imagine, that Celery then has task cleanup code which is clearing the thread context. Do you know the history of why we are using CELERY_ALWAYS_EAGER instead of using a separate worker in dev?

https://docs.celeryq.dev/en/3.1/configuration.html#celery-always-eager

Jeff_Cohen · October 9, 2022, 4:45am

@dave Thankfully I was able to track down the cause and solve the case of the disappearing RequestCache. If you’re interested you can see my proposed fix in this commit. There’s a method edx-platform.openedx.core.djangoapps.util.signals._clear_request_cache which clears the request cache upon completion of every celery task. Adding a check to that method to avoid clearing the cache when the tasks are run eagerly (i.e. sharing a thread with the django request) fixes the issue of the disappearing RequestCache.

Now that the RequestCache is reliable, the duplicate studentmodulehistory records are also fixed by the implementation you proposed. Thanks for the guidance and help!

dave · October 20, 2022, 6:44pm

@Jeff_Cohen: That’s a great find! And also a good reminder about how signal handlers like this can cause spooky-action-at-a-distance issues.

I think it’s just because it was simpler. For a very long time, devstack didn’t run celery processes at all. Also, if you have a bug in your celery task, it’s easier to run it all sequentially in process during development so that any exceptions thrown by your task just show up as a browser 500 error while executing the view–you don’t have to worry about reloading celery tasks, or connecting to them for debugging.

That looks great. Based on the comments in that method, it looks like the primary purpose of clearing the request cache after every celery task was to guard against memory leaks. I was worried that there was some correctness issue, which would have been much trickier to deal with. I agree with your approach for disabling it specifically when it’s being run eagerly. Memory leaks are less of an issue in dev (where the process is getting reloaded all the time). And even if it were an issue, the fact that it’s operating in the request/response cycle means that the middleware will reset the request cache anyhow.

Thank you for digging into this and finding the fix! Please just be sure to add comments explaining exactly what the thinking behind your commit was, because I’m guessing it will confuse the next person that looks at it four years from now.

dave · November 8, 2022, 4:47pm

Just as a quick followup, @Jeff_Cohen authored and merged these fixes:

github.com/openedx/edx-platform

fix: Do not clear request cache when running eager Celery tasks

openedx:master ← Medality-Health:no-clear-celery-eager-2

opened 01:40PM - 07 Nov 22 UTC

jcohen28

+23 -6

Description When the CELERY_ALWAYS_EAGER setting variable is True celery tasks …are run in the same thread as the Django Request. As a result, code that is intended to cleanup the celery worker inadvertently clears the RequestCache. There are times in which we may want to run celery eagerly and still clear the cache, such as testing. I have set CLEAR_REQUEST_CACHE_ON_TASK_COMPLETION = False in all dev and test environments that already have CELERY_ALWAYS_EAGER = True. The unit test that specifically tests whether the request cache is getting cleared upon completion of a celery task then overrides CLEAR_REQUEST_CACHE_ON_TASK_COMPLETION = True even though CELERY_ALWAYS_EAGER = True for the sake of that specific testing purpose. Supporting information The issue and its resolution are discussed [here](https://discuss.openedx.org/t/extra-history-record-stored-on-each-problem-submission/8081/14). Testing instructions Run the platform using the devstack/test/bokchoy settings files, or simply defining CELERY_ALWAYS_EAGER = True. Confirm that the RequestCache is not cleared each time a celery task is triggered with apply_sync. Deadline None Other information None

github.com/openedx/edx-platform

fix: Only create one StudentModuleHistory record per request

openedx:master ← Medality-Health:one-student-module-history-per-request-2

opened 02:13PM - 07 Nov 22 UTC

jcohen28

+92 -30

NOTE: THIS SOLUTION IS DEPENDENT ON https://github.com/openedx/edx-platform/pull…/31261 Description When a student submits a problem answer, the state is stored in a StudentModule record containing answer, score, correctness, etc. The record, though, is updated in multiple steps within the single request (first the grade is updated, then the state is updated separately). Each partial save would trigger a separate StudentModuleHistory record to be stored resulting in duplicate and inaccurate historical records. This solution uses the RequestCache to track within a request thread which StudentModules are updated and a single corresponding StudentModuleHistory id. If multiple update actions occur within the request cycle, then modify the history record that was already generated to ensure that each submission only results in one StudentModuleHistory record. Supporting information This issue and its solution were discussed [here](https://discuss.openedx.org/t/extra-history-record-stored-on-each-problem-submission/8081) Testing instructions Submit an answer for any problem xblock, then confirm that the courseware_studentmodulehistory table only contains one new row corresponding to the latest submission. Deadline None Other information None

Congratulations on your first merged Open edX PRs!

Topic		Replies	Views
Course Block Cache Resulting in Inconsistent Course Outline, Progress Page, and Instructor Reports Architecture	4	137	September 5, 2024
New `edxapp_csmh` MySQL database in Juniper Build-Test-Release juniper	4	1802	October 22, 2022
Open edX Developer - How to get course attempted questions record Development	0	433	January 30, 2020
Students' "Progress" page doesn't update when new Problems are added Site Operations Help nutmeg	20	1171	June 14, 2023
Help please: Very slow load time (10 seconds) for courseware on sections with several subsections and xblocks! Site Operators	15	3896	January 23, 2021

Extra history record stored on each problem submission

Related topics