Moscow Meetup, Cutting Edge ClickHouse Features and Roadmap

 

Sep 10, 2019

The recent Moscow ClickHouse meetup was quite a big event, as expected. ClickHouse is very popular in Russia and usage has penetrated widely among companies here.  450 people registered, and around 200 showed up at the Yandex conference hall on Thursday evening. Some attendees flew over from other cities and countries. The first talk started at 7pm, and  the last clickhousers left the building shortly after midnight, full of ideas from interesting talks and conversations. 

Alexey Milovidov, ClickHouse creator and lead, opened the meetup asking the audience for contributors. Contributors get a special gift — ClickHouse contributor sticker. I myself became a committer while ago after trying to fix MacOS build and submitting an easy fix to the tcp keepalive setting description. Even if you are not an experienced C++ developer, you can still be a contributor. Simple changes, new functions, or even fixes to documentation all count.

The first speaker was Alexander Krasheninnikov from Badoo. (slides). Alexander covered the very interesting topic of ClickHouse and Spark interoperability. His team uses ClickHouse in order to store events from different systems and provide data for monitoring metrics, which sometimes involve quite complex logic. In particular, Spark is used to build predictive models from ClickHouse data, move predictions back to ClickHouse, and then run queries on ClickHouse side for anomaly detection. The data flow is bi-directional: from ClickHouse to Spark and back again. It was especially fascinating to see how they implemented interaction between Spark and ClickHouse clusters in order to utilise the resources of both clusters efficiently. Spark job is sent to multiple workers and every worker selects only a fraction of data to process from ClickHouse. We at Altinity are looking very closely at all projects that promote symbiosis between ClickHouse and the machine learning stack. This was a great example. 

Alexander also talked about optimisations to JDBC driver usage he performed, and claimed 10x performance improvements after changes. He liked tweaking JDBC so much, that he even became the official maintainer for the ClickHouse JDBC driver. We can expect further performance optimisations and other improvements from him.

The next talk was equally interesting. Alexander Burmak from the Yandex.Cloud team talked about ClickHouse backups (slides). Yes, you read that correctly: ClickHouse backups! Yandex.Cloud offers ClickHouse-as-a-Service in Russia, and backups are vital for successful cloud business operation. 

Alexander described challenges of making backups for ClickHouse clusters, mentioned existing options like clickhouse-backup and clickhouse-copier, and presented Yandex.Cloud solution ch-backup. ch-backup consists of a management plane, database to keep backup metadata, and a Python app that is capable of incremental backups, supports retention policies, encryption, and other nice enterprise features. There were a number of interesting ideas in the talk. These include how to create tables in the right order during restore (we do the same in clickhouse-operator, but differently), or how to manage space efficiently. Unfortunately, ch-backup is not open source, and there are no immediate plans to make it available for the community. Nonetheless, it was a good source of ideas for us at Altinity, since we are working on a backup solution for ClickHouse as well.

Apparently Alexander is a popular name in the ClickHouse community. So the next talk was again from Alexander, in this case from myself. I reviewed the time-series DBMS problem, and explained how ClickHouse solves it effectively despite not being a dedicated time-series DBMS (slides). 

In our blog we have covered the ClickHouse approach to time-series quite a bit (see https://altinity.com/blog/tag/Time+Series), so I summarised our published and unpublished research on this topic. That included schema design options, encodings, aggregations with materialized views, table retention rules, and time-series specific queries. ClickHouse really shines in time-series; this is probably the second most successful ClickHouse use case after web/app analytics. Many new features in ClickHouse directly enhance handling of time-series data.

Mikhail Shiryaev from InnoGames continued the time-series story with a real-world use case: he explained how ClickHouse is used as a backend replacement in Graphite — GraphHouse (slides). The talk was a dive into Graphite configuration and ClickHouse optimisation, including application of compression codecs that help to reduce data size significantly.

Alexey Lizunov from Moscow Credit Bank in his lightning talk turned audience attention to another ClickHouse side: ClickHouse as a log storage. ClickHouse is used very often as a replacement for ElasticSearch, and that was exactly the story. We were glad to hear that our old article about Logstash plugin for ClickHouse was helpful to get it up and running. The main pitch of Alexey’s talk was about ease of operation. ClickHouse is so easy to use that even people far from the Big Data crowd can start it in minutes and have immediate results! ClickHouse is famous for its jaw-dropping effect when one tries it for the first time.

Interesting enough, the most recent version of ClickHouse can ‘consume itself’. What I mean is that it can collect its own monitoring data into a dedicated system table as a time series and store its own logs to another system table. This is something we were expecting for a long time, and finally it is there (in upcoming 19.14 version).

For the next lightning talk Murat Kabilov flew over from the Adjust office in Berlin. Murat presented the ClickHouse foreign data wrapper for PostgreSQL (https://github.com/adjust/clickhouse_fdw), a major improvement of the Percona initiative earlier this year. For those stuck with using PostgreSQL, it is a way to access ClickHouse speed from PostgreSQL. FDW supports inserts and selects, predicate pushdown and other features.

It was already past 10pm in the evening when Alexey Milovidov finally returned to the stage. He started with real-time merging of two pull requests: WITH TIES and WITH FILL modifiers for ORDER BY, and Globs support for HDFS and File table engines. It was an entertaining show that included a call for a person from the audience to press a merge button. 

After that Alexey started the last, long-awaited talk of the evening. He covered two main topics: cutting edge ClickHouse features and the secret roadmap (slides). Here is an incomplete list of new features that are already available or will be available in the next release:

  • Constraints for INSERT

  • Parameterized queries

  • ORC input format

  • Template format — ClickHouse can now generate HTML pages if you wish

  • ORDER BY optimisation

  • WITH FILL modifiers to fill gaps in data

  • Query masking rules

  • System tables for logs and monitoring

  • Low-level query profiler

  • Globs for external file based formats

  • LIVE VIEW functionality

  • Many performance improvements and so on.

The full list includes many more topics and is very impressive — ClickHouse is evolving very quickly. But what is going to be added to ClickHouse after these is even more impressive. Here are some of my personal favourites for the rest of the year:

  • Hybrid storage support

  • DDL definition for dictionaries

  • S3 import/exports

  • RBAC 

  • Merge join

  • Workload management

At the end Alexey also presented a very ambitious roadmap for 2020 but it is probably too early to show it yet. We plan to discuss ClickHouse futures in more detail in the upcoming months, and will definitely share it with the community and our readers. 

The audience in Moscow was great, there were a lot of good questions, and 100+ people stayed late until the very end. I was glad to talk to smart people and meet old friends. Organization was top notch — thanks to Yandex, who hosted and sponsored this event. 

The meetup was also streamed for the first time in real-time to the ClickHouse YouTube channel. It allowed more people to attend remotely.  If you missed the Moscow meetup, there is an opportunity to catch up in Munich on Sep 17th, Paris on Oct 3rd, and San Francisco on October 9th. Please come over. ClickHouse is fun!

 

Share