Issue running dump_data_to_clickhouse

I’ve an existing Open edX instance. When I added the tutor-contrib-aspects plugin, ran init jobs successfully, and then ran ./manage.py cms dump_data_to_clickhouse --object course_overviews --force, I got this error:

app@lms-7644c4fb9f-46g4n:~/edx-platform$ ./manage.py cms dump_data_to_clickhouse --object course_overviews --force

2025-08-30 10:52:39,461 INFO 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:178 - Now dumping 96 Course Overview to ClickHouse
2025-08-30 10:52:42,532 INFO 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:182 - Completed dumping 96 Course Overview to ClickHouse
2025-08-30 10:52:42,821 INFO 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:178 - Now dumping 5 XBlock to ClickHouse
2025-08-30 10:52:43,229 ERROR 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:69 - 500 Server Error: Internal Server Error for url: http://clickhouse:8123/?input_format_allow_errors_num=1&input_format_allow_errors_ratio=0.1&query=INSERT+INTO+event_sink.course_blocks+FORMAT+CSV
2025-08-30 10:52:43,230 ERROR 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:70 - {'Date': 'Sat, 30 Aug 2025 10:52:42 GMT', 'Connection': 'Keep-Alive', 'Content-Type': 'text/plain; charset=UTF-8', 'Access-Control-Expose-Headers': 'X-ClickHouse-Query-Id,X-ClickHouse-Summary,X-ClickHouse-Server-Display-Name,X-ClickHouse-Format,X-ClickHouse-Timezone,X-ClickHouse-Exception-Code', 'X-ClickHouse-Server-Display-Name': 'clickhouse-7bb6df678d-q8gwg', 'Transfer-Encoding': 'chunked', 'X-ClickHouse-Query-Id': '9acfbaaa-2a19-4314-a8ae-f096d86ef68b', 'X-ClickHouse-Timezone': 'UTC', 'X-ClickHouse-Exception-Code': '173', 'Keep-Alive': 'timeout=10, max=9999', 'X-ClickHouse-Summary': '{"read_rows":"15","read_bytes":"4779","written_rows":"5","written_bytes":"2235","total_rows_to_read":"0","result_rows":"0","result_bytes":"0","elapsed_ns":"404585538"}'}
2025-08-30 10:52:43,230 ERROR 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:71 - <Response [500]>
2025-08-30 10:52:43,230 ERROR 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:72 - Code: 173. DB::Exception: Couldn't allocate 181 bytes when parsing JSON: while executing 'FUNCTION JSONExtractInt(__table1.xblock_data_json : 2, 'section'_String :: 7) -> JSONExtractInt(__table1.xblock_data_json, 'section'_String) Int64 : 6': while pushing to view event_sink.dim_most_recent_course_blocks_mv (5f0eebaf-f124-4af7-8310-781e488f9f11). (CANNOT_ALLOCATE_MEMORY) (version 25.3.6.56 (official build))

2025-08-30 10:52:43,230 ERROR 167 [platform_plugin_aspects.management.commands.dump_data_to_clickhouse] [user None] [ip None] base_sink.py:217 - Error trying to dump XBlock OrderedDict([('org', 'DigiProMIN'), ('course_key', 'course-v1:DigiProMIN+CS103+2024_T23'), ('display_name', 'Emad Rad Test'), ('course_start', '2030-01-01 00:00:00+00:00'), ('course_end', None), ('enrollment_start', None), ('enrollment_end', None), ('self_paced', False), ('course_data_json', '{"advertised_start": null, "announcement": null, "lowest_passing_grade": 0.5, "invitation_only": false, "max_student_enrollments_allowed": null, "effort": null, "enable_proctored_exams": false, "entrance_exam_enabled": false, "external_id": null, "language": "en", "tags": []}'), ('created', '2025-08-29T14:00:49.162128Z'), ('modified', '2025-08-29T14:00:59.597478Z'), ('dump_id', UUID('ed8dd52f-fa46-455a-b621-1ab73f3fd176')), ('time_last_dumped', datetime.datetime(2025, 8, 30, 10, 52, 38, 229201, tzinfo=datetime.timezone.utc))]) to ClickHouse!
Traceback (most recent call last):
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 215, in send_item_and_log
    self.send_item(serialized_item, many=many)
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 284, in send_item
    self._send_clickhouse_request(request)
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 66, in _send_clickhouse_request
    response.raise_for_status()
  File "/openedx/venv/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://clickhouse:8123/?input_format_allow_errors_num=1&input_format_allow_errors_ratio=0.1&query=INSERT+INTO+event_sink.course_blocks+FORMAT+CSV
Traceback (most recent call last):
  File "/openedx/edx-platform/./manage.py", line 99, in <module>
    execute_from_command_line([sys.argv[0]] + django_args)
  File "/openedx/venv/lib/python3.11/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/openedx/venv/lib/python3.11/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/openedx/venv/lib/python3.11/site-packages/django/core/management/base.py", line 412, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/openedx/venv/lib/python3.11/site-packages/django/core/management/base.py", line 458, in execute
    output = self.handle(*args, **options)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/management/commands/dump_data_to_clickhouse.py", line 198, in handle
    dump_target_objects_to_clickhouse(
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/management/commands/dump_data_to_clickhouse.py", line 77, in dump_target_objects_to_clickhouse
    sink.dump(objects_to_submit, many=True)
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 188, in dump
    nested_sink.dump_related(
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/course_overview_sink.py", line 48, in dump_related
    self.dump(
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 181, in dump
    self.send_item_and_log(item_id, serialized_item, many)
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 215, in send_item_and_log
    self.send_item(serialized_item, many=many)
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 284, in send_item
    self._send_clickhouse_request(request)
  File "/openedx/venv/lib/python3.11/site-packages/platform_plugin_aspects/sinks/base_sink.py", line 66, in _send_clickhouse_request
    response.raise_for_status()
  File "/openedx/venv/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://clickhouse:8123/?input_format_allow_errors_num=1&input_format_allow_errors_ratio=0.1&query=INSERT+INTO+event_sink.course_blocks+FORMAT+CSV

This is what I’ve done so far:

  • Tried with --batch_size 10 and --sleep_time 10. Same error.
  • I thought it was some memory issue, but I’ve left 7GB of free memory. But even though I’ve set the max_memory_usage to a large number and run the script again, it didn’t work.

tutor version: v20.0.0
tutor-plugin-aspects: v2.3.1

I’ve created an issue on the upstream as well Issue running `dump_data_to_clickhouse` · Issue #171 · openedx/platform-plugin-aspects · GitHub

I found the issue.
My VM was hosted on FlyingCircus, and the CPU flags for AVX weren’t exposed. After I contacted them, they switched the CPU model to EPYC-v2, which enables AVX support, and that resolved the problem.

1 Like

Glad you figured out the issue! Let us know if you have any other problems :slight_smile:

1 Like