Altinity
ClickHouse Leading Service Provider

Blog

ClickHouse New Home

Sept 23, 2019
It is already more than 3 years since ClickHouse surprisingly popped up from inside the Yandex labs and moved to open source. Since then it grew up from an ugly duckling, that was sometimes not that easy to deal with, to a mature well established analytical database, used at companies from small startup to Fortune 500 enterprises all around the world. Today, ClickHouse has made one more important step, it moved from the Yandex cradle to its own new home at Github!

Read More
Moscow Meetup, Cutting Edge ClickHouse Features and Roadmap

Sept 10, 2019

The recent Moscow ClickHouse meetup was quite a big event, as expected. ClickHouse is very popular in Russia and usage has penetrated widely among companies here.  450 people registered, and around 200 showed up at the Yandex conference hall on Thursday evening. Some attendees flew over from other cities and countries. The first talk started at 7pm, and  the last clickhousers left the building shortly after midnight, full of ideas from interesting talks and conversations. 

Read More
ClickHouse Materialized Views Illuminated, Part 2

Sep 9, 2019

In the previous blog post on materialized views, we introduced a way to construct ClickHouse materialized views that compute sums and counts using the SummingMergeTree engine. The SummingMergeTree can use normal SQL syntax for both types of aggregates. We also let the materialized view definition create the underlying table for data automatically. Both of these techniques are quick but have limitations for production systems.
In the current post we will show how to create a materialized view with a range of aggregate types on an existing table.

Read More
ClickHouse Materialized Views Illuminated, Part 1

Sep 6, 2019

Readers of the Altinity blog know we love ClickHouse materialized views. Materialized views can compute aggregates, read data from Kafka, implement last point queries, and reorganize table primary indexes and sort order. Beyond these functional capabilities, materialized views scale well across large numbers of nodes and work on large datasets. They are one of the distinguishing features of ClickHouse.

Read More
Podcast: Combining Python And SQL To Build A PyData Warehouse

Sept 5, 2019
Robert Hodges on The Python Podcast with Tobias Macey

The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.

Read More
New Altinity Stable ClickHouse 19.11.8 Release Is Out!

Sep 2, 2019

It has been quite a while since we announced the previous 'Altinity Stable’ ClickHouse in December 2018. Since then there have been a lot of changes and new features in ClickHouse. The core team has merged almost 1000 pull requests, and 217 contributors completed about 6000 commits. Unfortunately, during those months of active development ClickHouse suffered from stability issues. We have been working hard together with the core ClickHouse team to nail them down. Today we are happy to introduce a new 'Altinity Stable’ release 19.11.8!

Read More
Far More than Cloud: Thoughts on the Future of Database Management Systems

July 30, 2019

A recent blog post from Gartner caught our attention at Altinity. The title is The Future of Database Management Systems is Cloud! and it makes the not-so sensational claim that public cloud is now the default platform for managing data.
What’s interesting is that the article makes two further claims that deserve very careful scrutiny.

Read More
New Encodings to Improve ClickHouse Efficiency

July 10, 2019

Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://www.altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.

This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.

Read More
Podcast: Scale Your Analytics On The Clickhouse Data Warehouse

July 8, 2019
Robert Hodges and Alexander Zaitsev on Data Engineering Podcast with Tobias Macey

The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production. It was interesting to learn about some of the custom data types and performance optimizations that are included.

Read More
Managing ClickHouse Datasets with ad-cli

July 1, 2019

Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.

Read More
clickhouse-local: The power of ClickHouse SQL in a single command

June 11, 2019

The most interesting innovations in databases come from asking simple questions.  For example: what if you could run ClickHouse queries without a server or attached storage?  It would just be SQL queries and the rich ClickHouse function library. What would that look like?  What problems could we solve with it?

We can answer the first question easily.  It would look like ‘clickhouse-local’!  You may not know about this handy tool, as not a lot has been written about it.  A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utility

Read More
Handling Variable Time Series Efficiently in ClickHouse

May 23, 2019

ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.

Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.

Read More
Introducing ClickHouse IPv4 and IPv6 Domains for IP Address Handling

May 21, 2019

One of our customers recently had a problem using CickHouse: the simple workflow of load-analyze-present wasn't as efficient as they were expecting. The body of the problem was with loading and presenting IPv4 and IPv6 addresses, which are traditionally stored in ClickHouse as UInt32 and FixedString(16) columns. These types have many advantages, like compact footprint and ease of comparing values. But they also have shortcomings that prompted us to seek a better solution.

Read More