New `edxapp_csmh` MySQL database in Juniper

regis · May 13, 2020, 8:26am

Juniper introduces a new mysql database that is used by edx-platform to store the state changes of problem modules for every student. These data are stored in the coursewaremodulextended app (as far as I understand). This database is known as the student_module_history in the edx-platform code base.

Once the ENABLE_CSMH_EXTENDED feature flag has been set to true (which is the default in Juniper), there is no turning back, because it affects the behaviour of the 0011_csm_id_bigint migration from the courseware.

I would like to know if there are specific precautions ot be taken to create this database and to migrate existing MySQL from Ironwood.

The Juniper page on Jira states that “We are looking at doing large migrations for the courseware_studentmodule table to prevent running out of primary keys. These will likely be large, semi-manually done migrations on CSM and it’s history table(s). Nothing is settled yet, but if we get this in before Juniper we will need to share our runbooks for anyone upgrading to it.” (cc @bmedx)

I wonder if this comment concerns the edxapp_csmh database?

I found a couple migration scripts in the configuration repository that lead me to believe that special precautions need to be taken when upgrading. On the other hand, these scripts date back from 2016…

I have faced the following error when viewing the courseware in the LMS:

django.db.utils.ProgrammingError: (1146, "Table 'openedx_csmh.coursewarehistoryextended_studentmodulehistoryextended' doesn't exist")

I managed to resolve this error by running some of the migrations specifically on the student_history_module database:

./manage.py lms migrate --database=student_module_history coursewarehistoryextended

However I did not find any trace of such a migration command in the configuration playbooks, so I wonder how it all works?

regis · May 21, 2020, 6:44pm

The silence is deafening!

I did some investigation, and found the explanation. The student_module_history has existed for a long time, but was optional until Juniper. The use of this database is controlled by the ENABLE_CSMH_EXTENDED feature flag. This feature flag was False by default in Ironwood (although it was enabled by the configuration playbooks) and switched to True in Juniper: Always run CSMH migrations in all contexts (#21776) · openedx/edx-platform@0befab3 · GitHub

The explanation, given in the comment, is the following:

Write new CSM history to the extended table. This will eventually default to True and may be removed since all installs should have the separate extended history table.

As a devops, I must say I was surprised by the above statement. From an optimization perspective, it makes zero sense to have separate databases if they both run on the same physical server. Thus, 90-99% of Open edX administrators have nothing to win by enabling this feature. Indeed, most people just apt-get install mysql and don’t rely on cloud services to host their database. Most often, this is the best way to proceed.

Moreover, the deployment playbooks suggest that all LMS migrations should be run on this student_module_history database. Running those migrations is precisely the most time-consuming step in the installation process, which is too bad if most users don’t need it.

Anyway, setting ENABLE_CSMH_EXTENDED = False solved my problem. I’m writing this in case having a separate student_module_history database becomes mandatory one day. Then I’ll have a post to point to

EDIT: to completely get rid of the student module history database, we also need to edit the DATABASE_ROUTER setting, both in the LMS and the CMS:

DATABASE_ROUTERS.remove(
    "openedx.core.lib.django_courseware_routers.StudentModuleHistoryExtendedRouter"
)

Also, we need to remove coursewarehistoryextended from the list of installed application:

INSTALLED_APPS.remove("coursewarehistoryextended")

Finally, you will need to merge the following pull request to make sure that migrations work despite the absence of the coursewarehistoryextended application: https://github.com/edx/edx-platform/pull/24237

dave · June 23, 2020, 12:26am

I’m coming to this thread really late, but the historical context here was that in addition to running out of keys, we were running out of space. We had an old RDS instance that had a 2 TB limit in it somewhere, and we weren’t ready to migrate to Aurora at the time–so that’s why the separate database.

The other thing that’s important to note is who had to do this work. I created the original “submission history” implementation as a one day hackathon project, to help certain course teams debug student problems. Courses where students submitted things to remote graders had no good way to see what the students actually submitted and what the grader spit back. That was probably back in 2012. It was a quick hack that grew into an operational problem, as 99.9% of the accumulated data was useless and there was no cleanup.

Years later, when it became an operational pain, it was the ops folks at edX that had to figure out how to make it keep working. What probably should have happened is that TNL (or whatever feature team would have owned it at the time) should have killed the table altogether and replaced it something with narrower scope and a 1-2 month time-to-live, so it wouldn’t constantly grow. What instead happened was that the ops folks created something that completely preserved the existing functionality, and involved a lot of painful migration planning.

Long story short: Avoid it if at all possible. Having the extended history in a separate database is a hack on top of a hack. I want to raze it all.

regis · June 23, 2020, 8:29am

Thanks for the detailed explanations David, I appreciate it!

I’ll keep that in mind

Zachary_Trabookis · October 22, 2022, 5:57am

@regis @dave
So we did enable this the Django setting ENABLE_CSMH_EXTENDED long time ago and followed the instructions here https://edx.readthedocs.io/projects/open-edx-release-notes/en/latest/CSMHE/migration_procedures.html#csmhe-procedures for migrating the data to the new CSMH database table.

I guess the only thing that I would lose if we didn’t enable CSMH is the Submission History for problems right if this feature is disabled?

We already have a edxapp_csmh database that we’re moving to openedx_csmh.

I was receiving the following error when I went to a page where previous student module history was found in this csmh table.

lms_1                        | 2022-10-22 05:18:00,402 ERROR 7 [lms.djangoapps.courseware.model_data] [user 5] [ip 130.127.255.220] model_data.py:408 - Saving user state failed for ZacharyTrabookis
lms_1                        | Traceback (most recent call last):
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/django/db/models/query.py", line 581, in get_or_create
lms_1                        |     return self.get(**kwargs), False
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/django/db/models/query.py", line 435, in get
lms_1                        |     raise self.model.DoesNotExist(
lms_1                        | lms.djangoapps.courseware.models.StudentModule.DoesNotExist: StudentModule matching query does not exist.
lms_1                        | 
lms_1                        | During handling of the above exception, another exception occurred:
lms_1                        | 
lms_1                        | Traceback (most recent call last):
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/django/db/backends/utils.py", line 84, in _execute
lms_1                        |     return self.cursor.execute(sql, params)
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/django/db/backends/mysql/base.py", line 73, in execute
lms_1                        |     return self.cursor.execute(query, args)
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/MySQLdb/cursors.py", line 206, in execute
lms_1                        |     res = self._query(query)
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/MySQLdb/cursors.py", line 319, in _query
lms_1                        |     db.query(q)
lms_1                        |   File "/openedx/venv/lib/python3.8/site-packages/MySQLdb/connections.py", line 259, in query
lms_1                        |     _mysql.connection.query(self, query)
lms_1                        | MySQLdb._exceptions.ProgrammingError: (1146, "Table 'openedx.coursewarehistoryextended_studentmodulehistoryextended' doesn't exist")

I found out on this page Multiple databases | Django documentation | Django that the order of the DATABASE_ROUTERS matters. If I performed a python list.append('StudentModuleHistoryExtendedRouter') then I would still receive that error above because the ReadReplicaRouter would go first as indicated in the lms/envs/common.py configuration for DATABASE_ROUTERS shown below.

The order in which routers are processed is significant. Routers will be queried in the order they are listed in the DATABASE_ROUTERS setting.

We’re currently working with the maple release and so it appears that this router setting 'openedx.core.lib.django_courseware_routers.StudentModuleHistoryExtendedRouter', that’s used for the CSMH data comes first in this setting.

github.com

openedx/edx-platform/blob/open-release/maple.3/lms/envs/common.py#L1023-L1028


      
          ############################ Global Database Configuration #####################
          
          DATABASE_ROUTERS = [
              'openedx.core.lib.django_courseware_routers.StudentModuleHistoryExtendedRouter',
              'edx_django_utils.db.read_replica.ReadReplicaRouter',
          ]

Here is what I did to resolve my error above. I created a separate Tutor plugin and added the following code to these patches.

openedx-cms-production-settings
openedx-lms-production-settings

# Enable CSMH Configuration
# https://edx.readthedocs.io/projects/open-edx-release-notes/en/latest/CSMHE/migration_options.html

# Tutor disables this with Juniper moving forward per this article.
# https://discuss.openedx.org/t/new-edxapp-csmh-mysql-database-in-juniper/2127
FEATURES['ENABLE_CSMH_EXTENDED'] = True
if FEATURES['ENABLE_CSMH_EXTENDED'] is True:

    # https://edx.readthedocs.io/projects/open-edx-release-notes/en/latest/CSMHE/migration_procedures.html#disable-reads-from-the-old-table
    FEATURES['ENABLE_READING_FROM_MULTIPLE_HISTORY_TABLES'] = False

    # Update Database Configuration
    if 'student_module_history' not in DATABASES:
        DATABASES.update({
            'student_module_history': {
                'CONN_MAX_AGE': 0,
                'ENGINE': 'django.db.backends.mysql',
                'HOST': '{{ MYSQL_HOST }}',
                'NAME': '{{ OPENEDX_CSMH_MYSQL_DATABASE }}',
                'ATOMIC_REQUESTS': True,
                'OPTIONS': {
                    'init_command': "SET sql_mode='STRICT_TRANS_TABLES'"
                },
                'PASSWORD': '{{ OPENEDX_CSMH_MYSQL_PASSWORD }}',
                'PORT': '{{ MYSQL_PORT }}',
                'USER': '{{ OPENEDX_CSMH_MYSQL_USERNAME }}'
            }
        })

    # Install the app
    INSTALLED_APPS.append("lms.djangoapps.coursewarehistoryextended")

    # Make sure that this is first in the list otherwise you'll receive the following error
    # Error: MySQLdb._exceptions.ProgrammingError: (1146, "Table 'openedx.coursewarehistoryextended_studentmodulehistoryextended' doesn't exist")
    # This page https://docs.djangoproject.com/en/3.2/topics/db/multi-db/ mentions that the 
    # order in which routers are processed is significant. Routers will be queried in the order
    # they are listed in the DATABASE_ROUTERS setting.
    DATABASE_ROUTERS.insert(
        0,
        'openedx.core.lib.django_courseware_routers.StudentModuleHistoryExtendedRouter',
    )

So in this article it sounds like Submission History is not important thing anymore and that’s why these settings are configured this way with Tutor.

FEATURES['ENABLE_CSMH_EXTENDED'] = False

# Get rid completely of coursewarehistoryextended, as we do not use the CSMH database
INSTALLED_APPS.remove("lms.djangoapps.coursewarehistoryextended")
DATABASE_ROUTERS.remove(
"openedx.core.lib.django_courseware_routers.StudentModuleHistoryExtendedRouter"
)

Is what I did above appropriate if you want/need Submission History to show up for a problem component and you’re worried about the primary key for that openedx.courseware_studentmodulehistory table to reach its max primary key value?

@traek728 @becdavid

Topic		Replies	Views
ADR for removing MongoDB from edx-platform Architecture	10	2126	October 27, 2022
Extra history record stored on each problem submission Development	14	767	November 8, 2022
Nutmeg upgrade: outlines.py: "openedx.core.djangoapps.content.learning_sequences.models.LearningContext.DoesNotExist: LearningContext matching query does not exist" Site Operations Help	30	3033	November 5, 2022
Error on upgrading to Juniper Site Operations Help	1	522	October 11, 2021
Change Database to RDS and DocumentDB Site Operators how-to , juniper	4	575	October 8, 2020

New `edxapp_csmh` MySQL database in Juniper

Related topics