Custom partitioning in ClickHouse 1.1.54310
Nov 8, 2017
ClickHouse introduced a new major feature in version 1.1.54310 - it is a custom partitioning for tables with MergeTree engine. Before it enforced to use Date field for partitioning and the only choice to partition by month.
The new version allows much more flexibility and we can choose the partition schema not only based on Date fields.
The feature still marked as experimental and we need to start the server with
Let’s review how to use this custom partitioning. We will take the table from ontime benchmark mentioned on the page https://clickhouse.yandex/docs/en/getting_started/example_datasets/ontime.html
The table is created as:
) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192)
With new syntax it can be rewritten as:
) ENGINE = MergeTree PARTITION BY toYYYYMM(FlightDate) Order By (Year, FlightDate);
Now it is clear how to change the definition for daily partitioning:
) ENGINE = MergeTree PARTITION BY FlightDate Order By (Year, FlightDate);
Or to use partitioning by different non-Date column:
) ENGINE = MergeTree PARTITION BY AirlineID Order By FlightDate;
Now, we can drop data for one specific date (e.g. for 1987-10-01)
ALTER TABLE ontime DROP PARTITION '1987-10-01';
Why is this feature useful?
It provides better granularity to manage data. We can perform data maintenance for the period which is required for business tasks, and not just by prescribed monthly periods.
One thing to be aware, that more fine partitioning may affect the compression.
For example with the default monthly partitioning and LZ4 compression, the full ontime table takes 7.7GB on disk, and when we switch to daily partition the size growth to 15GB. This is because the data for compression becomes much shorter, so the compression is less efficient.