July, 16, 2018

Introduction

ProxySQL is a popular open source, high performance and protocol-aware proxy server for MySQL and its forks. Since September 2017 ProxySQL supports ClickHouse as a backend, so clients can connect to ClickHouse via MySQL protocol. In practice, this helps MySQL-aware applications to start using ClickHouse as without changes in the client library. However, there are several limitations to this approach:

Supported command set in ProxySQL is limited
BINARY protocol (prepared statements) is not supported by ProxySQL, and the major one
ClickHouse and MySQL do not have the same SQL syntax and functions!

The latter limitation motivated ProxySQL creator René Cannaò to add additional functionality for query rewrite. With his permission, we cross-post his article describing new functionality in our blog.

To support ClickHouse as a backend, ProxySQL acts as a data bridge between MySQL protocol and ClickHouse protocol, allowing MySQL clients to execute queries in ClickHouse through it.
When we started using it in real-world scenarios, we faced an expected issue: ClickHouse’s SQL query syntax is different than MySQL’s syntax, and migrating an application from MySQL to ClickHouse isn’t just a matter of changing connections endpoint but it also requires modifying some queries. This needs development time, but not always possible.
One of ProxySQL most widely used feature is indeed the ability to rewrite queries, so often it is just a matter of writing the right query rules.
Let make an example.
We originally had this query:

SELECT COUNT(`id`), FROM_UNIXTIME(`created`, '%Y-%m') AS `date` 
FROM `tablename` 
GROUP BY FROM_UNIXTIME(`created`, '%Y-%m')

ClickHouse doesn’t support FROM_UNIXTIME, but it supports toDate and toTime.
ClickHouse also supports toYear and toMonth, it’s useful to format the date the same way FROM_UNIXTIME does.
Therefore, it is possible to rewrite the query as:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));

To perform the above rewrite, we will need two rules, one for the first FROM_UNIXTIME, and one for the second one. Or we can just use one rewrite rules to replace FROM_UNIXTIME(created, '%Y-%m') no matter if on the retrieved fields or in the GROUP BY clause, generatinging the following query:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created))));

Does it look great? No, not yet!
For the month of March, concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) will return 2018-3 : not what the application was expecting, as MySQL would return 2018-03 . The same applies for all the first 9 months of each year.
Finally, we rewrote the query as the following, and the application was happy:

SELECT COUNT(`id`), substring(toString(toDate(created)),1,7) AS `date`
FROM `tablename`
GROUP BY substring(toString(toDate(created)),1,7);

Note: because of the datatypes conversions that ClickHouse needs to perform in order to execute the above query, its execution time is about 50% slower than executing the following query:

SELECT COUNT(`id`), concat(toString(toYear(toDate(created))), '-', toString(toMonth(toDate(created)))) AS `date`
FROM `tablename`
GROUP BY toYear(toDate(created)), toMonth(toDate(created));

Architecture using 2 ProxySQLs

Great, we now know how to rewrite the query!
Although, the ClickHouse module in ProxySQL doesn’t support query rewrite. The ClickHouse module in ProxySQL is only responsible to transform data between MySQL and ClickHouse protocol, and vice-versa.

Therefore the right way of achieving this solution is to configure two ProxySQL layers, one instance responsible for rewriting the query and sending the rewritten query to the second ProxySQL instance, this one responsible for executing the query (already modified) on ClickHouse.

Architecture using only one ProxySQL

Does the above architecture seem complex? Not really, it is reasonably straightforward.
Can it be improved?
As you can see from the previous chart, the ClickHouse module and the MySQL module listen on different ports. The first ProxySQL instance is receiving traffic on port 6033, and sending traffic on the second PorxySQL instance on port 6090.
Are two instances really required? The answer is no.
In fact, a single instance can receive MySQL traffic on port 6033, rewrite the query, and send the rewritten query to itself on port 6090, to finally execute the rewritten query on ClickHouse.
This diagram describes the architecture:

Configuration

For the reference, below is the step to configure one single ProxySQL to send traffic to ClickHouse, and use itself as a backend.

Create ClickHouse user:

INSERT INTO clickhouse_users (username,password) VALUES ('clicku','clickp');
LOAD CLICKHOUSE USERS TO RUNTIME;
SAVE CLICKHOUSE USERS TO DISK;

Create MySQL user (same as ClickHouse):

INSERT INTO mysql_users(username,password) SELECT username, password FROM clickhouse_users;
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;

Configure ProxySQL itself as a backend for MySQL traffic:

INSERT INTO mysql_servers(hostname,port) VALUES ('127.0.0.1',6090);
SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL SERVERS TO RUNTIME;

Create a query rule for rewriting queries:

INSERT INTO mysql_query_rules (active,match_pattern,replace_pattern,re_modifiers) VALUES 
(1,"FROM_UNIXTIME(`created`, '%Y-%m')", 'substring(toString(toDate(created)),1,7)',"CASELESS,GLOBAL");
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

This is a very simple example to demonstrate how to perform query rewrite from MySQL to ClickHouse using just one ProxySQL instance.
In a real-world scenarios you will need to create more rules based on your own queries.

Conclusion

Not only ProxySQL allows to send queries to ClickHouse, but it also allows to rewrite queries to solve issues related to different SQL syntax and available functions.
To achieve this, ProxySQL uses its ability to use itself as a backend: rewrite the query in the MySQL module, and execute it in the ClickHouse module.

Cross-post from ClickHouse and ProxySQL queries rewrite

Get in touch with ClickHouse experts.