Ultimate ClickHouse Helm Chart

Wed, Jan 2, 2019 3-minute read

clickhouse-helm.jpg

Seems like ClickHouse is experiencing growth in industries like IoT, Web Analytics, AdTech, Log Management, because of its robustness with big amounts of data.

IMO it’s a great tool but missing developer community support. For example there is no built-in UI, no official Helm Chart, etc. But is it a problem? I don’t think it is, it’s possible to build a robust dev / prod environment using the right tools.

In this article I am going to explain the Helm Chart I prepared for the ClickHouse.

Environment requirements

Let’s list what do we need to have in our ClickHouse Kubernetes-based environment.

Scalability

Since you decided to use ClickHouse you’re expecting a lot of data in your system. You have to prepare your cluster to be able to read and write this data. ClickHouse has different ways to setup the replication, for example circular replication cluster topology, or data distribution. What’s common here is that you need an easy way to scale your cluster, add/remove shards, or add/remove replicas.

Monitoring

It’s crucial to have a monitoring of everything when you deployed it, because you don’t want to be blind with TBs of data.

GUI (or optional?)

Sometimes it’s very handy to have web-based gui to run some queries, of course with limited access. It is also fine to use ClickHouse CLI.

Security

Noone should be able to access your cluster from outside. Your services should also access CH with strict access level.

Make it easy to run locally

Reuse the same setup on all environments, except the scale.

Persistence

Data should persist after any possible crash.

Tools we’re going to use

Helm Chart

Thanks to Helm we can spin up the whole environment with a single command. Also we can configure each environment using values.yaml config files.

Custom ClickHouse image

I had to modify the official Docker image a little bit, so it gets the NODE_ID of stateful set and puts it into macros.xml. We need macros.xml across all our servers when we use replicated tables.

Zookeeper

ClickHouse uses Zookeeper for replication / distribution, so we have to prepare k8s statefulset, pvc and service for it (clickhouse/templates/zookeeper.yaml).

Configd

clickhouse/templates/configd.yaml contains all configuration of ClickHouse:

  • Cluster. 2 replicas by default
  • Zookeeper. 2 replicas by default
  • Graphite. 1 replica
  • Users. writer and reader

Statefulset with PVC

ClickHouse is deployed using statefulset with k8s persistent volume attached to each pod.

CLI

ClickHouse client is deployed to the same environment.

GUI

For GUI I added Tabix.UI, where you can connect to your ClickHouse server and execute queries.

Graphite

ClickHouse does not have a tool for monitoring packaged, but there are several 3rd-party monitoring solutions that can be used. Graphite is one of the popular options, and it can be natively integrated with ClickHouse.

Run it

It’s not an official chart yet, so we have to clone the repo first:

git clone git@github.com:plutov/clickhouse-helm.git
cd clickhouse-helm

Run with default values:

helm install -f ./clickhouse/values.yaml --name ch --namespace=default ./clickhouse

Conclusion

Feel free to use this Helm Chart and propose any improvements. And let’s make an official Chart soon!