Altinity
ClickHouse Leading Service Provider

Blog

New Encodings to Improve ClickHouse Efficiency

July 10, 2019

Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://www.altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.

This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.

Read More
Managing ClickHouse Datasets with ad-cli

July 1, 2019

Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.

Read More
clickhouse-local: The power of ClickHouse SQL in a single command

June 11, 2019

The most interesting innovations in databases come from asking simple questions.  For example: what if you could run ClickHouse queries without a server or attached storage?  It would just be SQL queries and the rich ClickHouse function library. What would that look like?  What problems could we solve with it?

We can answer the first question easily.  It would look like ‘clickhouse-local’!  You may not know about this handy tool, as not a lot has been written about it.  A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility

Read More
Handling Variable Time Series Efficiently in ClickHouse

May 23, 2019

ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.

Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.

Read More
Introducing ClickHouse IPv4 and IPv6 Domains for IP Address Handling

May 21, 2019

One of our customers recently had a problem using CickHouse: the simple workflow of load-analyze-present wasn't as efficient as they were expecting. The body of the problem was with loading and presenting IPv4 and IPv6 addresses, which are traditionally stored in ClickHouse as UInt32 and FixedString(16) columns. These types have many advantages, like compact footprint and ease of comparing values. But they also have shortcomings that prompted us to seek a better solution.

Read More
ClickHouse In the Storm. Part 1: Maximum QPS estimation

May 2, 2019

ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests -- from several per hour to many dozens or even low hundreds per second --affecting huge ranges of data (gigabytes/millions of rows).

But how it will behave in other scenarios? Let's try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.

This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.

Read More
Altinity ClickHouse Operator for Kubernetes

Apr 9, 2019

When I was setting up my first ClickHouse clusters 3 years ago it was like a journey to an unknown world full of caveats. ClickHouse is very simple and easy to use but not THAT simple. Sometimes I dreamed that setting up the cluster would be as easy as making a cup of coffee. It took us a while to find the right approach, but finally our dreams came true. Today, we are happy to introduce ClickHouse operator for Kubernetes!

Read More
A Magical Mystery Tour of the LowCardinality Data Type

Mar 27, 2019

Many ClickHouse features like LowCardinality data type seem mysterious to new users.  ClickHouse often deviates from standard SQL and many data types and operations do not even exist in other data warehouses. The key to understanding is that the ClickHouse engineering team values speed more than almost any other property. Mysterious SQL expressions often turn out to be 'secret weapons' to achieve unmatched speed.

In fact, the LowCardinality data type is an example of just such a feature. It has been available since Q4 2018 and was marked as production ready in Feb 2019, but still is not documented, magically appearing in some documentation examples. In this article we will fill the gap  by explaining how LowCardinality works, and when it should be used.

Read More
Do-It-Yourself Multi-Volume Storage in ClickHouse

Mar 5, 2019

Many applications have very different requirements for acceptable latencies / processing speed on different parts of the database. In time-series use cases most of your requests touch only the last day of data (‘hot’ data). Those queries should run very fast. Also a lot of background processing actions happen on the ‘hot’ data--inserts, merges, replications, and so on. Such operations should likewise be processed with the highest possible speed and without significant latencies.

Read More
ClickHouse and Python: Jupyter Notebooks

Feb 25, 2019
Jupyter Notebooks are an indispensable tool for sharing code between users in Python data science. For those unfamiliar with them, notebooks are documents that contain runnable code snippets mixed with documentation. They can invoke Python libraries for numerical processing, machine learning, and visualization. The code output includes not just text output but also graphs from powerful libraries like matplotlib and seaborn. Notebooks are so ubiquitous that it’s hard to think of manipulating data in Python without them.

ClickHouse support for Jupyter Notebooks is excellent. I have spent the last several weeks playing around with Jupyter Notebooks using two community drivers: clickhouse-driver and clickhouse-sqlalchemy. The results are now published on Github at https://github.com/Altinity/clickhouse-python-examples. The remainder of this blog contains tips to help you integrate ClickHouse data to your notebooks.

Read More
ClickHouse Meetup at Cloudflare. Recap

Feb 20, 2019
The ClickHouse Meetup at Cloudflare went great! It was a pleasure to see old friends and to meet new people enthusiastic about ClickHouse. Robert Hodges gave an intro talk about the ClickHouse execution model and how it contributes to rapid query responses. Alex Hofsteede walked through how Sentry.io uses ClickHouse and the steps they went through to migrate applications seamlessly onto ClickHouse from other solutions.

Read More
ClickHouse Continues to Crush Time Series

Feb 14, 2019

In our previous articles we demonstrated that ClickHouse -- a general purpose analytics DB -- can easily compete with specialized DBMSs for time series data: TimescaleDB and InfluxDB. There were, however, certain queries, pretty typical for time series, where ClickHouse seemed at first glance to be at a disadvantage. The most notable example is returning the latest measurement for particular device. We will take this query and demonstrate how ClickHouse advanced features, namely materialized views and self-aggregating tables, can  dramatically improve performance.

Read More
ClickHouse and Python: Getting to Know the Clickhouse-driver Client

Feb 1, 2019

Python is a force in the world of analytics due to powerful libraries like numpy along with a host of machine learning frameworks. ClickHouse is an increasingly popular store of data. As a Python data scientist you may wonder how to connect them. This post contains a review of the clickhouse-driver client.  It’s a solidly engineered module that is easy to use and integrates easily with standard tools like Jupyter Notebooks and Anaconda.  Clickhouse-driver is a great way to jump into ClickHouse Python connectivity.

Read More
Field Report: Migrating from Redshift to ClickHouse

Jan 25, 2019

FunCorp is an international developer of entertaining Apps. The most popular is iFunny - a fun picture and GIF app that lets users to pass the time looking at memes, comics, funny pictures, cat GIFs, etc. Plus, users can even upload their own content and share it. The iFunny app has been using Redshift for quite some time as a database for events in backend services and mobile apps. We went with them because in the beginning there really weren’t any alternatives comparable in terms of cost and convenience. However, the public release of ClickHouse was a real game changer. We studied it inside and out, compared the cost and possible architecture, and this summer finally decided to try it out and see if we could use it. This article is all about the challenge Redshift had been helping us solve and how we migrated this solution to ClickHouse.

Read More