How to Forward Data from Aspects LRS (Ralph/ClickHouse) to an External Database

I’m currently using the Aspects Learner Analytics plugin via Tutor, which integrates several tools like ClickHouse, Apache Superset, and Ralph (LRS) for tracking and visualizing learner activity. The system is working well, and xAPI statements are successfully stored in ClickHouse.

Now, I would like to forward or replicate the tracked data (from LRS or ClickHouse) into another external database (e.g., PostgreSQL or MySQL) for further custom analytics and integration with other systems.

Has anyone here done something similar or can guide me on:

  • Best practices for exporting or syncing xAPI data from ClickHouse or Ralph
  • Tools or methods to extract and transfer data regularly
  • Whether there’s native support for this in Vector or another part of the Aspects pipeline
1 Like

@Sara_Burns any insight here?

Hi @Omer_Hameed! I’m so happy to hear everything is working well for you. I don’t believe we currently have any documentation about getting data out of Clickhouse, your best bet would be to look at the CH or SQL documentation. From a quick Google search, it seems like most articles are about getting data INTO clickhouse, so you may have to export data to a csv and then import into SQL.

I would recommend using the CH data over Ralph because the data in CH is the result of pulling out important fields from the original xAPI strings and performing many transformations to populate various data tables (enrollments, video watches, problem attempts, engagement, etc). If you have questions on where to find different data within Clickhouse, you can refer to the dbt docs here, the Aspects documentation here, or post your question for the community and use the Aspects tag.

Hope this helps!

Hi @Omer_Hameed ! Just to add on to what Sara said, you could also create a MySQL or Postgres table type in ClickHouse itself and run a query to insert new events on a schedule (or possibly use a CH materialized view to insert in real time). That may be the easiest solution.

If you are already running Vector as your pipeline, you may be able to use Aspects’ vector-common-toml Tutor patch to add a Postgres sink to the Vector configuration, similar to the ClickHouse implementation, this would also get you the raw xAPI events. They don’t seem to have a MySQL sink at this time.

As Sara noted, the have other data sources that get populated aside from the xAPI that you would probably need other solutions for as well, such as the course structure and user data. These you could also just get from ClickHouse using the MySQL table type. Since that data comes from other pipelines, it does not go through Vector so you wouldn’t be able to get at it that way.

I’m sure other people will have similar needs, so please let us know how you go about it and how things work out! Aspects is still fairly new, I’m sure there are many things we can improve.

I’m curious to learn more about your vision and why PostgreSQL or MySQL are the preferred warehouse for further custom analytics. I would have thought that Clickhouse is the more powerful analytics engine and therefore the right warehouse for pulling in additional data from other sources such as PostgreSQL and MySQL, rather than exporting from Clickhouse.

Thanks for the clarification! Currently, exporting to CSV isn’t an option in my ClickHouse setup. I can access the complete tracking data directly inside ClickHouse, but I don’t have a way to move or extract it externally for further use.

I did find one document about updating the internal configuration for external connectivity, but it seems to involve a lot of custom configuration that could potentially affect the stability of my system. Is there any official or recommended documentation on how to safely configure a custom ClickHouse instance for external access?

Thanks for sharing this approach! The main limitation I’m running into is that ClickHouse by default runs inside a local container, which means it can’t be externally connected with third-party platforms.

For the solution you suggestedβ€”using a MySQL/Postgres table type in ClickHouse or configuring a Vector sinkβ€”do you have any official documentation or links you’d recommend? That would really help me better understand how to implement this safely.

I’m not sure I can explain it better than the links I gave above, but the short version is that using the MySQL or Postgres table types in ClickHouse means that ClickHouse is connecting out to those servers, not the other way around. So as long as your CH container can reach the MySQL or Postgres server you should be ok (and if not, we probably can’t help you with networking here).

The steps to do a full test are fairly simple (assuming MySQL for below):

Set up a database / table to write to and a user with permissions to write (I’m using root below for simplicity):

mysql> create database clickhouse_test_db;
Query OK, 1 row affected (0.01 sec)

mysql> use clickhouse_test_db;
Database changed

mysql> create table test_verb (id varchar(64) not null, verb varchar(255) not null);
Query OK, 0 rows affected (0.02 sec)

Then set up ClickHouse to connect to the given table:

ba4310497a7e :) CREATE TABLE IF NOT EXISTS default.mysql_test
(
    id varchar(64) not null,
    verb varchar(255) not null
) ENGINE = MySQL('mysql-host:3306', 'clickhouse_test_db', 'test_verb', 'root', '<root password>');

CREATE TABLE IF NOT EXISTS default.mysql_test
(
    `id` varchar(64) NOT NULL,
    `verb` varchar(255) NOT NULL
)
ENGINE = MySQL('mysql-host:3306', 'clickhouse_test_db', 'test_verb', 'root', '<root password>')

Query id: 7c8dc4b5-97f6-4c42-a8de-ca5f8fbe4459

Ok.

0 rows in set. Elapsed: 0.003 sec.

Check that the MySQL table is empty (queried from ClickHouse):

ba4310497a7e :) select * from default.mysql_test;

SELECT *
FROM default.mysql_test

Query id: 583509dc-df23-4e1b-910e-054d5b064441

Ok.
0 rows in set. 
Elapsed: 0.012 sec.

Insert some data into MySQL from ClickHouse (this presumes you have data in ClickHouse xapi.xapi_events_all_parsed if it’s a lot of data you may just want to throw a LIMIT on the query):

ba4310497a7e :) insert into default.mysql_test (id, verb) select event_id, verb_id from xapi.xapi_events_all_parsed;

INSERT INTO default.mysql_test (id, verb) SELECT
    event_id,
    verb_id
FROM xapi.xapi_events_all_parsed

Query id: 626bd10e-330d-4692-8628-e9974f961baf

Ok.

0 rows in set. Elapsed: 0.134 sec. Processed 11.00 thousand rows, 195.99 KB (82.27 thousand rows/s., 1.47 MB/s.)
Peak memory usage: 748.37 KiB.

Select from MySQL in ClickHouse again:

ba4310497a7e :) select * from default.mysql_test limit 5;

SELECT *
FROM default.mysql_test
LIMIT 5

Query id: b4b5ef90-8e86-4d90-a32b-6cdfa709c297

   β”Œβ”€id───────────────────────────────────┬─verb─────────────────────────────────────┐
1. β”‚ 529c2e05-013d-49a3-b79d-7250f65422f3 β”‚ http://adlnet.gov/expapi/verbs/attempted β”‚
2. β”‚ 1eb33300-203a-40ed-8084-f18ff1b10e2e β”‚ http://adlnet.gov/expapi/verbs/attempted β”‚
3. β”‚ d2fb3795-3228-43be-9f94-d27e1b6bd339 β”‚ http://adlnet.gov/expapi/verbs/attempted β”‚
4. β”‚ a674a8fd-8baf-4052-b578-0b13024f4009 β”‚ http://adlnet.gov/expapi/verbs/attempted β”‚
5. β”‚ 55a484af-9f77-416a-87ce-655f07ba62f7 β”‚ http://adlnet.gov/expapi/verbs/attempted β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

5 rows in set. Elapsed: 0.007 sec. Processed 11.00 thousand rows, 1.04 MB (1.67 million rows/s., 157.55 MB/s.)
Peak memory usage: 11.95 KiB.

And on the MySQL side:

mysql> select * from test_verb limit 5;
+--------------------------------------+------------------------------------------+
| id                                   | verb                                     |
+--------------------------------------+------------------------------------------+
| 529c2e05-013d-49a3-b79d-7250f65422f3 | http://adlnet.gov/expapi/verbs/attempted |
| 1eb33300-203a-40ed-8084-f18ff1b10e2e | http://adlnet.gov/expapi/verbs/attempted |
| d2fb3795-3228-43be-9f94-d27e1b6bd339 | http://adlnet.gov/expapi/verbs/attempted |
| a674a8fd-8baf-4052-b578-0b13024f4009 | http://adlnet.gov/expapi/verbs/attempted |
| 55a484af-9f77-416a-87ce-655f07ba62f7 | http://adlnet.gov/expapi/verbs/attempted |
+--------------------------------------+------------------------------------------+
5 rows in set (0.00 sec)

That’s the basics of it. Any query you can write in ClickHouse can then materialize its results to a table in one of the other databases. You just need a mechanism to run those queries on a timer to keep the data fresh.