ClickHouse Roadmap 2018
Jan 8, 2018
ClickHouse development team is known for lack of published development plans. Sometimes they disclose plans responding to questions and bug reports, but more often discuss the roadmap at meetups.
In this article, we describe 'secret' 2018 roadmap that has been presented at ClickHouse December meetup in Moscow.
Certainly, the roadmap is not final and there are no strict commitments on dates as well.
Short-term plans (Q1/2018)
Multi-dimensional and nested arrays
It may look like this:
CREATE TABLE t ( x Array(Array(String)), z Nested( x Array(String), y Nested(...)) ) ENGINE = MergeTree ORDER BY x`
External MySQL and ODBC tables
External tables can be integrated to ClickHouse using external dictionaries. This will be an alternative and a more convenient way to do so.
SELECT ... FROM mysql('host:port', 'db', 'table', 'user', 'password')`
Efficient data copy between ClickHouse clusters
Currently, it is possible to copy data using remote() function, e.g.:
INSERT INTO t SELECT * FROM remote(...)
The performance of this will be improved by proper distributed execution.
O_DIRECT for merges
Should improve OS cache performance and correspondingly query performance for 'hot' queries.
Middle term (Q1-Q2/2018)
UPDATE/DELETE in order to comply with Europe GDPR
Protobuf and Parquet input/output formats
Create dictionaries by DDL queries
Currently, it is inconvenient and confusing that dictionaries are defined in external XML files while being a part of DB schema. The new approach will fix that.
WITH ROLLUP and WITH CUBE for GROUP BY
Custom encoding/compression for columns
Currently, ClickHouse support LZ4 and ZSTD compressions for columns, and compressions settings are global (see our article 'Compression in ClickHouse' for more details). Column level encoding (e.g. delta encoding) and compression will allow more efficient data storage and therefore faster queries.
Store data at multiple disk volumes of a single server
That will make it easier to extend disk system as well as use different disk systems for different DBs or tables. Currently, users have to use symlinks if DB/table needs to be stored in another volume.
A lot of enhancements and fixes are planned for query execution. In particular:
Using index for ‘in (subquery)’
Currently, index is not used for such queries resulting in lower performance.
Predicate pushdown from ‘where’ into subqueries and Predicate pushdown for views.
These two are related since view is replaced by subquery. Currently, performance of filter conditions for views is significantly degraded, views can not use primary key of the underlying table, that makes views on big tables pretty much useless.
Short-circuit expressions evaluation (ternary operator, if, multiIf)
Currently, ClickHouse evaluates all branches even if the first one needs to be returned due to logical condition result.
Using primary key for GROUP BY and ORDER BY
This may speed up certain types of queries since data is already partially pre-sorted.
Longer term (Q3-Q4/2018)
Longer term plans are not yet finalized. There are two major projects on the list so far.
Resource pools for query execution.
That will allow managing workloads more efficiently.
ANSI SQL JOIN syntax
That will make ClickHouse more friendly for numerous SQL tools.
This is all we know about Yandex ClickHouse development team plans at the time of writing. Most of planned new features and improvements were requested by community some time ago. Certainly, we expect some other new features to pop up as well as community contributions.
We will keep this post updated when new details are available.