Tutor Cairn Infrastructure Sizing Recommendations?

Good Day Team!

We run an instance of open edX (nutmeg) with tutor and have the Cairn plugin installed.

We are currently using AWS EC2’s to run our infrastructure. We recently moved the Tutor Carin plugin onto it’s own EC2 as we were having some timeouts and different errors in components in the superset front end.

We are currently running Superset, Clickhouse, and Vector in an EC2 on AWS.

The majority of the errors that we are seeing is:
Code: 241. DB::Exception: Memory limit (for query) exceeded
MEMORY_LIMIT_EXCEEDED

A few questions for others that are using Cairn:

  • What does your infrastructure look like? (EC2s, EKS, Other)
  • What size is your infrastructure?
  • Do you have Clickhouse externalized outside of Docker? (RDS, other) What size of Clickhouse db do you have?
  • Are there other scaling or tuning that you have done for performance?

The CPU and RAM are not hitting 100% during these errors. So, we are trying to determine if there are configurations that we need to tune/change.

Data Stats:
Table:Count
_tracking:137,484,321
completion_blockcompletion:49,641,043
course_overviews_courseoverview:532
auth_user:166,083

Thanks in advance for your help!

This is super interesting Justin! To be honest, I was waiting for a user to face a scaling issue with the single node Clickhouse instance. With no data clearing policy, this issue is bound to happen at some point. It’s interesting that it happens with 10⁸ tracking log lines and 10⁵ users.

Could you tell us which query is causing the MEMORY_LIMIT_EXCEEDED error? This will help us figure out an adequate solution.

@regis thanks for the reply!

Here are some of the most recent queries that are hitting the MEMORY_LIMIT_EXCEEDED error for us. Some of these are from new tables and views that were added to enable some new visualizations for our instructors.

If you need more information just let me know.

Thanks in advance for your help!

Query 1

SELECT course_id AS course_id,
       user_id AS user_id,
       user_email AS user_email,
       display_name AS display_name
FROM
  (SELECT course_id,
          display_name,
          block_key,
          user_id,
          user_email
   FROM course_block_completion) AS virtual_table
WHERE user_email IN ('EMAIL-REMOVED')
ORDER BY course_id ASC,
         user_email ASC,
         display_name ASC
LIMIT 10000

Query 2

SELECT display_name AS display_name,
       COUNT(*) AS count
FROM
  (SELECT course_id,
          display_name,
          block_key,
          user_id,
          user_email
   FROM course_block_completion) AS virtual_table
WHERE user_email IN ('EMAIL-REMOVED')
GROUP BY display_name
ORDER BY count DESC
LIMIT 10000

Query 3

SELECT display_name AS display_name
FROM
  (SELECT course_id,
          display_name,
          block_key,
          user_id,
          user_email
   FROM course_block_completion) AS virtual_table
GROUP BY display_name
ORDER BY display_name ASC
LIMIT 1000

Query 4

SELECT course_number AS course_number,
      course AS course,
      student_name AS student_name,
      user_email AS user_email,
      section AS section,
      lesson AS lesson,
      unit AS unit,
      attempt AS attempt,
      grade_percentage AS grade_percentage,
      time AS time,
      course_id AS course_id
FROM openedx.problem_scores
WHERE time >= toDateTime('2023-04-11 00:00:00')
  AND time < toDateTime('2023-07-11 00:00:00')
ORDER BY course_id ASC,
         student_id ASC,
         section_id ASC,
         problem_id ASC,
         attempt ASC
LIMIT 50

Query 5

SELECT attempt AS attempt,
       AVG(grade_percentage) AS "AVG(grade_percentage)",
       max(attempt) AS "MAX(attempt)"
FROM openedx.problem_scores
WHERE time >= toDateTime('2023-06-11 00:00:00')
  AND time < toDateTime('2023-07-11 00:00:00')
GROUP BY attempt
ORDER BY "MAX(attempt)" ASC
LIMIT 25

Let’s focus on the first query:

Is this a query that you wrote yourself or did it come with the Cairn default dashboard? That ORDER BY display_name looks fishy to me. It’s very unlikely that there is an index on display_name. Given the large size of your completion_blockcompletion table, I’m guessing it could cause a memory error.

Can you try to modify the query to remove the ORDER BY display_name statement?