Hi, has anyone successfully run ClickHouse? I’ve enabled platform-plugin-aspects, but it’s consuming too much memory. I’m running it on a Kubernetes setup with a dedicated 32GB node for ClickHouse.
Here are my user_config.xml settings:
We run Click House often in different configurations for different customers. Not knowing the specifics of your instance in terms of the traffic that you reach is difficult to judge, but I can tell you some experiences we have had.
First of all, we mostly install click house outside of the Kubernetes cluster, and we put it on a separate server on a separate set of servers in case we’re running in a cluster. The key limitation for click house is the speed of the disc. Make sure that you’re reaching somewhere in the order of megabytes when you’re running randomized inserts.
Running a command such as time dd if=/dev/zero of=/tmp/.wlog bs=4k count=50000 oflag=dsync status=progress && rm -f /tmp/.wlog should be giving you more than ~1.5 MB/s or better. Sometimes having all the abstraction layers from k8s or from a network storage.
Also it looks like you are constraining the performance of the database in many dimensions, this might be creating more issues than what it is solving.
Hi @Felipe thanks for the detailed insights - especially around disk and avoiding over-constraining the setup.
We did some additional testing on our side and noticed something concerning around memory usage with below query. I update my pod limit to 9GB and max_memory to 5GB.FYI, I’m using aspects version : 3.0.3.
For example:
SELECT * FROM reporting.fact_pageview_engagement LIMIT 1;
This query processed ~217K rows (~42 MB), but:
Peak memory usage: ~4.96 GiB
Eventually failed with:
MEMORY_LIMIT_EXCEEDED (limit set to 5 GiB)
Failure occurred during JoiningTransform
This seems unusually high for such a small dataset, especially given that:
Overall pod memory limit is 9Gi
So I’m trying to understand:
Why would such query trigger ~5GB memory usage?
Could this be related to how joins are executed in ClickHouse (e.g., full in-memory joins)?
Is aspects generating queries or schemas that are particularly memory-intensive?
Are there recommended settings to control join memory usage more aggressively?
Also, your point about Kubernetes overhead and storage is interesting - we haven’t yet benchmarked disk throughput inside the pod, so we’ll look it next.
We’ll also take a look at using Vector as the ingestion pipeline as suggested.
Any guidance on the join-related memory behavior (or tuning for aspects workloads specifically) would be really helpful.
In your example, fact_pageview_engagement is a view within ClickHouse, so when you run that select query, ClickHouse is actually querying fact_navigation_completion which is also a view selecting from navigation_events… We try to balance views vs tables so both inserts (when events are created) and selects (for the dashboards) are as efficient as possible.
We try to make sure the dashboards load in a reasonable amount of time with a large amount of data - are you seeing issues in Superset?