Seems like ClickHouse is experiencing growth in industries like IoT, Web Analytics, AdTech, Log Management, because of its robustness with big amounts of data.
IMO it’s a great tool but missing developer community support. For example there is no built-in UI, no official Helm Chart, etc. But is it a problem? I don’t think it is, it’s possible to build a robust dev / prod environment using the right tools.
In this article I am going to explain the Helm Chart I prepared for the ClickHouse.
Let’s list what do we need to have in our ClickHouse Kubernetes-based environment.
Since you decided to use ClickHouse you’re expecting a lot of data in your system. You have to prepare your cluster to be able to read and write this data. ClickHouse has different ways to setup the replication, for example circular replication cluster topology, or data distribution. What’s common here is that you need an easy way to scale your cluster, add/remove shards, or add/remove replicas.
It’s crucial to have a monitoring of everything when you deployed it, because you don’t want to be blind with TBs of data.
GUI (or optional?)
Sometimes it’s very handy to have web-based gui to run some queries, of course with limited access. It is also fine to use ClickHouse CLI.
Noone should be able to access your cluster from outside. Your services should also access CH with strict access level.
Make it easy to run locally
Reuse the same setup on all environments, except the scale.
Data should persist after any possible crash.
Tools we’re going to use
- Official ClickHouse Server Docker image
- Official ClickHouse Client Docker image
Thanks to Helm we can spin up the whole environment with a single command. Also we can configure each environment using
values.yaml config files.
Custom ClickHouse image
I had to modify the official Docker image a little bit, so it gets the NODE_ID of stateful set and puts it into macros.xml. We need macros.xml across all our servers when we use replicated tables.
ClickHouse uses Zookeeper for replication / distribution, so we have to prepare k8s statefulset, pvc and service for it (
clickhouse/templates/configd.yaml contains all configuration of ClickHouse:
- Cluster. 2 replicas by default
- Zookeeper. 2 replicas by default
- Graphite. 1 replica
- Users. writer and reader
Statefulset with PVC
ClickHouse is deployed using statefulset with k8s persistent volume attached to each pod.
ClickHouse client is deployed to the same environment.
For GUI I added Tabix.UI, where you can connect to your ClickHouse server and execute queries.
ClickHouse does not have a tool for monitoring packaged, but there are several 3rd-party monitoring solutions that can be used. Graphite is one of the popular options, and it can be natively integrated with ClickHouse.
It’s not an official chart yet, so we have to clone the repo first:
git clone email@example.com:plutov/clickhouse-helm.git cd clickhouse-helm
Run with default values:
helm install -f ./clickhouse/values.yaml --name ch --namespace=default ./clickhouse
Feel free to use this Helm Chart and propose any improvements. And let’s make an official Chart soon!