Welcome to the year 2020! New Year came and it’s a good time to update your ClickHouse to the latest Altinity stable version which we are presenting today. As always, Altinity Stable is a version we know is good and recommended for production use. You can use a newer version in your lab environment to try the latest features.

The previous Altinity-stable version was 19.13.7.57, and a huge amount of work has been done since then: 2744 commits in 528 pull-requests from 113 contributors. While the most productive contributors are from the Yandex team (thank you, guys, you’re great!), we are proud to say that several important features were implemented by Altinity, or in collaboration between us and Yandex. Please see the detailed release notes below.

Major changes since 19.13.7

Notable new features

Tiered storage: multiple storage volumes for MergeTree tables. It’s possible to store fresh data on SSD and automatically move old data to HDD. We already discussed this functionality in our blog.
LIVE VIEW tables that we also described in our blog. The support in this release is not complete, but new improvements are coming quickly. Feel free to try it out.
WITH FILL modifier of ORDER BY for data gap filling. It allows to fill ‘missing data’ to provide uniform reports. For example, you can fill missing dates for time series so that every day is shown even if some days have no data. This feature is not documented yet, so here is an example how missing dates can be filled:

SELECT arrayJoin([today()-10, today()]) AS d ORDER BY d ASC WITH FILL

Table constraints that are checked at insert.
Sensitive data masking for query_log, server logs, process list with regexp-based rules.
Table function input() for reading incoming data in INSERT SELECT query. A very useful feature when you need to preprocess data just before inserting. We are going to showcase it in our planned articles around log processing use cases.
Cascaded materialized views. It is an important feature for many use cases including Kafka integration. In particular, it allows you to have load raw data from Kafka using MV and then aggregate it using another MV. That was not possible in previous ClickHouse releases.

Kafka

Altinity took over support for the Kafka engine a few months ago. Kafka functionality and stability has been improved in this release, in particular:

ClickHouse can act as Kafka producer, and not just to read from Kafka, but also send data back with an insert statement.
Atomic parsing of each message: ‘kafka_row_delimiter’ is now obsolete (ignored)
More reliable commit logic
Virtual columns ‘_partition’ and ‘_timestamp’ for Kafka engine table.
Parsing of most of the formats is working properly now

Backward Incompatible Changes

count() supports only a single argument
Legacy ‘asterisk_left_columns_only’ setting has been removed (it was disabled by default).
Numeric values for Enums can now be used directly in ‘IN’ section of the query
Changed serialization format of ‘bitmap*’ aggregate function states to improve performance. Serialized states of ‘bitmap*’ from previous versions cannot be read. If you happen to use bitmap aggregate functions, please contact us before upgrading.
system.query_log column type was changed from UInt8 to Enum8
ANY RIGHT/FULL/INNER JOIN is disabled by default. Set ‘any_join_distinct_right_table_keys’ setting to enable them.

Known issues

In case of topic rebalancing on the Kafka side, duplicates are possible. We are going to address it in 2020/Q1. https://github.com/ClickHouse/ClickHouse/issues/7259
Select from mysql() table function is broken in this release. Two workarounds are possible:
- Use MySQL table engine
- ‘create table as mysql(…)’ and then select from this table — this is actually a shortcut to MySQL table engine
Some constant-like functions can misbehave when queried through Distributed tables: https://github.com/ClickHouse/ClickHouse/issues/7666
‘system reload dictionary’ does not reload dictionaries with ‘update_field’: https://github.com/ClickHouse/ClickHouse/issues/7440
complicated ‘where’ conditions involving fields of UInt8 type with values > 1 can return unexpected results. (workaround: instead of a and b and c use a <> 0 and b <> 0 and c <> 0) https://github.com/ClickHouse/ClickHouse/issues/7772
If you want to preserve old ANY JOIN behavior while upgrading from a version before 19.13, you may need to install 19.13.7 first, change ‘any_join_distinct_right_table_keys’ setting there and after that you can upgrade to 19.16. But we recommend to review your queries and rewrite them without this join type. In future releases it will be available, but with a different name (SEMI JOIN).

Besides that, we can confirm that 19.16.10.44 is well tested and stable, and we didn’t see any significant issues while operating it in our environment and several user installations. We recommend upgrading to this version if you are using an older one.

Please contact us at info@altinity.com if you experience any issues with the upgrade. Meanwhile, enjoy the new stable release!

Appendix

The curious minds can find some interesting reading in this section. We carefully analyzed all commits and created a comprehensive list of other changes in various parts of ClickHouse server.

New Formats

ORC format.
Template/TemplateIgnoreSpaces format. It allows to parse / generate data in custom text formats. So you can for example generate HTML directly from ClickHouse thus turning ClickHouse to the web server.
CustomSeparated/CustomSeparatedIgnoreSpaces format. Supports custom escaping and delimiter rules.
JSONEachRowWithProgress
Parse unquoted NULL literal as NULL (enabled by setting ‘format_csv_unquoted_null_literal_as_null’).
Initialize null fields with default values if the data type of this field is not nullable (enabled by setting ‘input_format_null_as_default’).

New or improved functions

Aggregate function combinators which fill null or default value when there is nothing to aggregate: -OrDefault, -OrNull
Introduce uniqCombined64() to get sane results for cardinality > UINT_MAX
QuantileExactExclusive and Inclusive aggregate functions
hasToken/hasTokenCaseInsensitive (look for the token in string the same way as token_bf secondary index)
multiFuzzyMatchAllIndices, multiMatchAllIndices (return the Array of all matched indices in multiMatch family functions)
repeat function for strings
sigmoid and tanh functions for ML applications
Roaring Bitmaps:

Changes CRoaring serialization functions (you will not be able to read Bitmaps created by earlier versions)
bitmapSubsetInRange,
bitmapMin, bitmapMax,
bitmapSubsetLimit(bitmap, range_start, limit),
groupBitmapAnd, groupBitmapOr, groupBitmapXor

geohashesInBox(longitude_min, latitude_min, longitude_max, latitude_max, precision) which creates an array of precision-long strings of geohash boxes covering the provided area.

Support for wildcards in paths of table functions file and hdfs. If the path contains wildcards, the table will be readonly

SELECT * FROM hdfs('hdfs://hdfs1:9000/some_dir/another_dir/*/file', 'Parquet', 'col1 String')

New function neighbour(value, offset[, default_value]). Allows to reach prev/next row within the column.
Optimize queries with ORDER BY expressions clause, where expressions have coinciding prefix with sorting key in MergeTree tables. This optimization is controlled by ‘optimize_read_in_order’ ‘setting
New function arraySplit and arrayReverseSplit which can split an array by “cut off” conditions. They are useful in time sequence handling.
Table function values (the name is case-insensitive). It allows to create table with some data inline.

SELECT * FROM VALUES('a UInt64, s String', (1, 'one'), (2, 'two'), (3, 'three'))

fullHostName (alias FQDN)
numbers_mt() | multithreaded version of numbers().
currentUser() (and alias user()), returning login of authorized user.
S3 engine and table function. Partial support in this release (no authentication), complete version is expected in 19.18.x and later
WITH TIES modifier for LIMIT

New dictionaries features

Redis as an external dictionary source
<sparse_hashed> dictionary layout, that is functionally equivalent to the <hashed> layout, but is more memory efficient. It uses about twice as less memory at the cost of slower value retrieval.
‘allow_dictionaries’ user setting that works similar to ‘allow_databases’.
HTTP source new attributes: credentials and http-headers.

Operations / Monitoring

system.metric_log table which stores values of system.events and system.metrics with specified time interval.
system.text_log in order to store ClickHouse logs to itself
Support for detached parts removal:

ALTER TABLE <table_name> DROP DETACHED PART '<part_id>'

MergeTree now has an additional option ‘ttl_only_drop_parts’ (disabled by default) to avoid partial pruning of parts, so that they dropped completely when all the rows in a part are expired.
Added miscellaneous function getMacro(name) that returns String with the value of corresponding <macros> from configuration file on current server where the function is executed.

New metrics & events

system.metrics: DistributedFilesToInsert, GlobalThread, GlobalThreadActive, LocalThread, LocalThreadActive
system.events: Merge

New system tables:

system.disks
system.storage_policies

New columns in system tables:

system.clusters: errors_count, estimated_recovery_time
system.collations: language
system.parts: disk_name
system.parts_columns: disk_name
system.processes: os_thread_ids
system.query_log: os_thread_ids
system.tables: storage_policy

system.merge_tree_settings changed/added

Name	Default	Description
concurrent_part_removal_threshold	100	Activate concurrent part removal (see ‘max_part_removal_threads’) only if the number of inactive data parts is at least this.
max_part_loading_threads	auto(6)	The number of threads to load data parts at startup.
max_part_removal_threads	auto(6)	The number of threads for concurrent removal of inactive data parts. One is usually enough, but in ‘Google Compute Environment SSD Persistent Disks’ file removal (unlink) operation is extraordinarily slow and you probably have to increase this number (recommended is up to 16).
storage_policy	default	Name of storage disk policy
ttl_only_drop_parts	0	Only drop altogether the expired parts and not partially prune them.

system.settings changed/added

Name	Default	Description
allow_drop_detached	0	Allow ALTER TABLE … DROP DETACHED PART[ITION] … queries
allow_experimental_live_view	0	Enable LIVE VIEW. Not mature enough.
allow_introspection_functions	0	Allow functions for introspection of ELF and DWARF for query profiling. These functions are slow and may impose security considerations.
any_join_distinct_right_table_keys	0 (was 1)	Enable old ANY JOIN logic with many-to-one left-to-right table keys mapping for all ANY JOINs. It leads to confusing not equal results for ‘t1 ANY LEFT JOIN t2’ and ‘t2 ANY RIGHT JOIN t1’. ANY RIGHT JOIN needs one-to-many keys maping to be consistent with LEFT one.
asterisk_left_columns_only	removed
connection_pool_max_wait_ms	0	The wait time when connection pool is full.
default_max_bytes_in_join	100000000	Maximum size of right-side table if limit’s required but max_bytes_in_join is not set.
distributed_directory_monitor_max_sleep_time_ms	30000	Maximum sleep time for StorageDistributed DirectoryMonitors, it limits exponential growth too.
distributed_replica_error_cap	1000	Max number of errors per replica, prevents piling up increadible amount of errors if replica was offline for some time and allows it to be reconsidered in a shorter amount of time.
distributed_replica_error_half_life	60	Time period reduces replica error counter by 2 times.
format_custom_escaping_rule	Escaped	Field escaping rule (for CustomSeparated format)
format_custom_field_delimiter		Delimiter between fields (for CustomSeparated format)
format_custom_result_after_delimiter		Suffix after result set (for CustomSeparated format)
format_custom_result_before_delimiter		Prefix before result set (for CustomSeparated format)
format_custom_row_after_delimiter		Delimiter after field of the last column (for CustomSeparated format)
format_template_resultset		Path to file which contains format string for result set (for Template format)
format_template_row		Path to file which contains format string for rows (for Template format)
format_template_rows_between_delimiter		Delimiter between rows (for Template format)
input_format_csv_unquoted_null_literal_as_null	0	Consider unquoted NULL literal as N
input_format_null_as_default	0	For text input formats initialize null fields with default values if data type of this field is not nullable
input_format_tsv_empty_as_default	0	Treat empty fields in TSV input as default values.
input_format_values_accurate_types_of_literals	1	For Values format: when parsing and interpreting expressions using template, check actual type of literal to avoid possible overflow and precision issues.
input_format_values_deduce_templates_of_expressions	0	For Values format: if field could not be parsed by streaming parser, run SQL parser, deduce template of the SQL expression, try to parse all rows using template and then interpret expression for all rows.
joined_subquery_requires_alias	0	Force joined subqueries to have aliases for correct name qualification.
kafka_max_wait_ms	5000	The wait time for reading from Kafka before retry.
live_view_heartbeat_interval	15	The heartbeat interval in seconds to indicate live query is alive.
max_http_get_redirects	0	Max number of http GET redirects hops allowed. Make sure additional security measures are in place to prevent a malicious server to redirect your requests to unexpected services.
max_live_view_insert_blocks_before_refresh	64	Limit maximum number of inserted blocks after which mergeable blocks are dropped and query is re-executed.
min_free_disk_space_for_temporary_data	0	The minimum disk space to keep while writing temporary data used in external sorting and aggregation.
optimize_read_in_order	1	Enable ORDER BY optimization for reading data in corresponding order in MergeTree tables.
partial_merge_join	0	Use partial merge join instead of hash join for LEFT and INNER JOINs.
partial_merge_join_optimizations	0	Enable optimizations in partial merge join
partial_merge_join_rows_in_left_blocks	10000	Group left-hand joining data in bigger blocks. Setting it to a bigger value increase JOIN performance and memory usage.
partial_merge_join_rows_in_right_blocks	10000	Split right-hand joining data in blocks of specified size. It’s a portion of data indexed by min-max values and possibly unloaded on disk.
query_profiler_cpu_time_period_ns	1000000000 (was 0)	Highly experimental. Period for CPU clock timer of query profiler (in nanoseconds). Set 0 value to turn off CPU clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.
query_profiler_real_time_period_ns	1000000000 (was 0)	Highly experimental. Period for real clock timer of query profiler (in nanoseconds). Set 0 value to turn off real clock query profiler. Recommended value is at least 10000000 (100 times a second) for single queries or 1000000000 (once a second) for cluster-wide profiling.
replace_running_query_max_wait_ms	5000	The wait time for running query with the same query_id to finish when setting ‘replace_running_query’ is active.
s3_min_upload_part_size	536870912	The mininum size of part to upload during multipart upload to S3.
temporary_live_view_timeout	5	Timeout after which temporary live view is deleted.

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE^® SOLUTIONS

Get in touch with ClickHouse experts.

New ClickHouse® Altinity Stable Release 19.16.10.44

Major changes since 19.13.7

Notable new features

Kafka

Backward Incompatible Changes

Known issues

Appendix

New Formats

New or improved functions

New dictionaries features

Operations / Monitoring

New metrics & events

New system tables:

New columns in system tables:

system.merge_tree_settings changed/added

system.settings changed/added

Leave a Reply Cancel reply

PRODUCTS

OPEN SOURCE SOFTWARE

CLICKHOUSE® SOLUTIONS

Get in touch with ClickHouse experts.

Major changes since 19.13.7

Notable new features

Kafka

Backward Incompatible Changes

Known issues

Appendix

New Formats

New or improved functions

New dictionaries features

Operations / Monitoring

New metrics & events

New system tables:

New columns in system tables:

system.merge_tree_settings changed/added

system.settings changed/added

Leave a Reply Cancel reply

CLICKHOUSE^® SOLUTIONS