Sept 23, 2019
It is already more than 3 years since ClickHouse surprisingly popped up from inside the Yandex labs and moved to open source. Since then it grew up from an ugly duckling, that was sometimes not that easy to deal with, to a mature well established analytical database, used at companies from small startup to Fortune 500 enterprises all around the world. Today, ClickHouse has made one more important step, it moved from the Yandex cradle to its own new home at Github!
Sep 16, 2019
Grafana is a very powerful and popular open source dashboard tool used in many ClickHouse projects. ClickHouse in Grafana is enabled by the Grafana ClickHouse plugin developed by the company Vertamedia, a ClickHouse early adopter. Many users in the community now depend on the Vertamedia implementation.
Sept 10, 2019
The recent Moscow ClickHouse meetup was quite a big event, as expected. ClickHouse is very popular in Russia and usage has penetrated widely among companies here. 450 people registered, and around 200 showed up at the Yandex conference hall on Thursday evening. Some attendees flew over from other cities and countries. The first talk started at 7pm, and the last clickhousers left the building shortly after midnight, full of ideas from interesting talks and conversations.Read More
Sep 9, 2019
In the previous blog post on materialized views, we introduced a way to construct ClickHouse materialized views that compute sums and counts using the SummingMergeTree engine. The SummingMergeTree can use normal SQL syntax for both types of aggregates. We also let the materialized view definition create the underlying table for data automatically. Both of these techniques are quick but have limitations for production systems.
In the current post we will show how to create a materialized view with a range of aggregate types on an existing table.
Sep 6, 2019
Readers of the Altinity blog know we love ClickHouse materialized views. Materialized views can compute aggregates, read data from Kafka, implement last point queries, and reorganize table primary indexes and sort order. Beyond these functional capabilities, materialized views scale well across large numbers of nodes and work on large datasets. They are one of the distinguishing features of ClickHouse.Read More
Sept 5, 2019
Robert Hodges on The Python Podcast with Tobias Macey
The ecosystem of tools and libraries in Python for data manipulation and analytics is truly impressive, and continues to grow. There are, however, gaps in their utility that can be filled by the capabilities of a data warehouse. In this episode Robert Hodges discusses how the PyData suite of tools can be paired with a data warehouse for an analytics pipeline that is more robust than either can provide on their own. This is a great introduction to what differentiates a data warehouse from a relational database and ways that you can think differently about running your analytical workloads for larger volumes of data.Read More
Sep 2, 2019
It has been quite a while since we announced the previous 'Altinity Stable’ ClickHouse in December 2018. Since then there have been a lot of changes and new features in ClickHouse. The core team has merged almost 1000 pull requests, and 217 contributors completed about 6000 commits. Unfortunately, during those months of active development ClickHouse suffered from stability issues. We have been working hard together with the core ClickHouse team to nail them down. Today we are happy to introduce a new 'Altinity Stable’ release 19.11.8!Read More
Aug 26, 2019
On August 17 I had the pleasure of presenting at Data Con LA 2019. My talk was Data Warehouse and Kubernetes: Lessons from the ClickHouse Operator. It described learnings from our work to enable ClickHouse to run easily on Kubernetes. This short article discusses key points from the talk as well as takeaways from the conference itself.Read More
Aug 19, 2019
The latest San Francisco Bay Area ClickHouse Meetup was in Silicon Valley on August 13th. We had between 25 and 30 attendees at H2O.ai, who kindly hosted the event at their offices in Mountain View. The crowd was enthusiastic, leading to a lot of back-and-forth questions during the presentations. We had a total of three talks.Read More
July 30, 2019
A recent blog post from Gartner caught our attention at Altinity. The title is The Future of Database Management Systems is Cloud! and it makes the not-so sensational claim that public cloud is now the default platform for managing data.
What’s interesting is that the article makes two further claims that deserve very careful scrutiny.
July 24, 2019
If you’re a business that manages a lot of commercial data, and you need a solution that reliably stores and analyzes that information, then this call is for you!
Robert Hodges, Altinity CEO with Rick Nuske, My Future BusinessRead More
July 10, 2019
Modern analytical databases would not exist without efficient data compression. Storage gets cheaper and more performant, but data sizes typically grow even faster. Moore’s Law for big data outperforms its analogy in hardware. In our blog we already wrote about ClickHouse compression (https://www.altinity.com/blog/2017/11/21/compression-in-clickhouse) and Low Cardinality data type wrapper (https://www.altinity.com/blog/2019/3/27/low-cardinality). In this article we will describe and test the most advanced ClickHouse encodings, which especially shine for time series data. We are proud that some of those encodings have been contributed to ClickHouse by Altinity.
This article presents an early preview of new encoding functionality for ClickHouse release 19.11. As of the time of writing, release 19.11 is not yet available. In order to test new encodings ClickHouse can be built from source, or a testing build can be installed. We expect that ClickHouse release 19.11 should be available in public releases in a few weeks.Read More
July 8, 2019
Robert Hodges and Alexander Zaitsev on Data Engineering Podcast with Tobias Macey
The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production. It was interesting to learn about some of the custom data types and performance optimizations that are included.Read More
July 1, 2019
Large datasets are critical for anyone trying out or testing ClickHouse. ClickHouse is so fast that you typically need at least 100M rows to discern differences when tuning queries. Also, killer features like materialized views are much more interesting with large volumes of diverse data. Despite the importance of such datasets to ClickHouse users, there is little tooling available to help manage them easily.Read More
June 11, 2019
The most interesting innovations in databases come from asking simple questions. For example: what if you could run ClickHouse queries without a server or attached storage? It would just be SQL queries and the rich ClickHouse function library. What would that look like? What problems could we solve with it?
We can answer the first question easily. It would look like ‘clickhouse-local’! You may not know about this handy tool, as not a lot has been written about it. A simple explanation is that ‘clickhouse-local’ turns the ClickHouse SQL query processor into a command line utilityRead More
June 10, 2019
The last two weeks have been a busy time for ClickHouse-related events. Altinity as well as the Yandex teams have been doing a world tour that included events in the US as well as Asia. Besides the opportunity to meet people there were a lot of great presentations on ClickHouse itself.Read More
May 23, 2019
ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.
Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.Read More
May 21, 2019
One of our customers recently had a problem using CickHouse: the simple workflow of load-analyze-present wasn't as efficient as they were expecting. The body of the problem was with loading and presenting IPv4 and IPv6 addresses, which are traditionally stored in ClickHouse as UInt32 and FixedString(16) columns. These types have many advantages, like compact footprint and ease of comparing values. But they also have shortcomings that prompted us to seek a better solution.
May 3, 2019
The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.