Altinity
ClickHouse Leading Service Provider

Blog

Posts tagged ClickHouse
Moscow Meetup, Cutting Edge ClickHouse Features and Roadmap

Sept 10, 2019

The recent Moscow ClickHouse meetup was quite a big event, as expected. ClickHouse is very popular in Russia and usage has penetrated widely among companies here.  450 people registered, and around 200 showed up at the Yandex conference hall on Thursday evening. Some attendees flew over from other cities and countries. The first talk started at 7pm, and  the last clickhousers left the building shortly after midnight, full of ideas from interesting talks and conversations. 

Read More
ClickHouse Materialized Views Illuminated, Part 2

Sep 9, 2019

In the previous blog post on materialized views, we introduced a way to construct ClickHouse materialized views that compute sums and counts using the SummingMergeTree engine. The SummingMergeTree can use normal SQL syntax for both types of aggregates. We also let the materialized view definition create the underlying table for data automatically. Both of these techniques are quick but have limitations for production systems.
In the current post we will show how to create a materialized view with a range of aggregate types on an existing table.

Read More
ClickHouse Materialized Views Illuminated, Part 1

Sep 6, 2019

Readers of the Altinity blog know we love ClickHouse materialized views. Materialized views can compute aggregates, read data from Kafka, implement last point queries, and reorganize table primary indexes and sort order. Beyond these functional capabilities, materialized views scale well across large numbers of nodes and work on large datasets. They are one of the distinguishing features of ClickHouse.

Read More
New Altinity Stable ClickHouse 19.11.8 Release Is Out!

Sep 2, 2019

It has been quite a while since we announced the previous 'Altinity Stable’ ClickHouse in December 2018. Since then there have been a lot of changes and new features in ClickHouse. The core team has merged almost 1000 pull requests, and 217 contributors completed about 6000 commits. Unfortunately, during those months of active development ClickHouse suffered from stability issues. We have been working hard together with the core ClickHouse team to nail them down. Today we are happy to introduce a new 'Altinity Stable’ release 19.11.8!

Read More
Far More than Cloud: Thoughts on the Future of Database Management Systems

July 30, 2019

A recent blog post from Gartner caught our attention at Altinity. The title is The Future of Database Management Systems is Cloud! and it makes the not-so sensational claim that public cloud is now the default platform for managing data.
What’s interesting is that the article makes two further claims that deserve very careful scrutiny.

Read More
New Encodings to Improve ClickHouse Efficiency

July 10, 2019

Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://www.altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.

This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.

Read More
Podcast: Scale Your Analytics On The Clickhouse Data Warehouse

July 8, 2019
Robert Hodges and Alexander Zaitsev on Data Engineering Podcast with Tobias Macey

The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production. It was interesting to learn about some of the custom data types and performance optimizations that are included.

Read More
Managing ClickHouse Datasets with ad-cli

July 1, 2019

Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.

Read More
clickhouse-local: The power of ClickHouse SQL in a single command

June 11, 2019

The most interesting innovations in databases come from asking simple questions.  For example: what if you could run ClickHouse queries without a server or attached storage?  It would just be SQL queries and the rich ClickHouse function library. What would that look like?  What problems could we solve with it?

We can answer the first question easily.  It would look like ‘clickhouse-local’!  You may not know about this handy tool, as not a lot has been written about it.  A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility

Read More
Handling Variable Time Series Efficiently in ClickHouse

May 23, 2019

ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.

Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.

Read More
Introducing ClickHouse IPv4 and IPv6 Domains for IP Address Handling

May 21, 2019

One of our customers recently had a problem using CickHouse: the simple workflow of load-analyze-present wasn't as efficient as they were expecting. The body of the problem was with loading and presenting IPv4 and IPv6 addresses, which are traditionally stored in ClickHouse as UInt32 and FixedString(16) columns. These types have many advantages, like compact footprint and ease of comparing values. But they also have shortcomings that prompted us to seek a better solution.

Read More
ClickHouse In the Storm. Part 1: Maximum QPS estimation

May 2, 2019

ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests -- from several per hour to many dozens or even low hundreds per second --affecting huge ranges of data (gigabytes/millions of rows).

But how it will behave in other scenarios? Let's try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.

This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.

Read More
Altinity ClickHouse Operator for Kubernetes

Apr 9, 2019

When I was setting up my first ClickHouse clusters 3 years ago it was like a journey to an unknown world full of caveats. ClickHouse is very simple and easy to use but not THAT simple. Sometimes I dreamed that setting up the cluster would be as easy as making a cup of coffee. It took us a while to find the right approach, but finally our dreams came true. Today, we are happy to introduce ClickHouse operator for Kubernetes!

Read More
A Magical Mystery Tour of the LowCardinality Data Type

Mar 27, 2019

Many ClickHouse features like LowCardinality data type seem mysterious to new users.  ClickHouse often deviates from standard SQL and many data types and operations do not even exist in other data warehouses. The key to understanding is that the ClickHouse engineering team values speed more than almost any other property. Mysterious SQL expressions often turn out to be 'secret weapons' to achieve unmatched speed.

In fact, the LowCardinality data type is an example of just such a feature. It has been available since Q4 2018 and was marked as production ready in Feb 2019, but still is not documented, magically appearing in some documentation examples. In this article we will fill the gap  by explaining how LowCardinality works, and when it should be used.

Read More
ClickHouse and Python: Jupyter Notebooks

Feb 25, 2019
Jupyter Notebooks are an indispensable tool for sharing code between users in Python data science. For those unfamiliar with them, notebooks are documents that contain runnable code snippets mixed with documentation. They can invoke Python libraries for numerical processing, machine learning, and visualization. The code output includes not just text output but also graphs from powerful libraries like matplotlib and seaborn. Notebooks are so ubiquitous that it’s hard to think of manipulating data in Python without them.

ClickHouse support for Jupyter Notebooks is excellent. I have spent the last several weeks playing around with Jupyter Notebooks using two community drivers: clickhouse-driver and clickhouse-sqlalchemy. The results are now published on Github at https://github.com/Altinity/clickhouse-python-examples. The remainder of this blog contains tips to help you integrate ClickHouse data to your notebooks.

Read More