Altinity

Blog

MySQL to ClickHouse data migration and replication

Dec 5, 2017

Altinity has developed an open source clickhouse-mysql tool that helps to migrate existing MySQL tables to ClickHouse and setup realtime replication of changes from MySQL to ClickHouse (INSERTS only). Here we demonstrate how it can be done by a simple example.

We will be using a subset of airline ontime test dataset:

mysql> select count(*) from airline.ontime;   
+----------+   
| count(*) |   
+----------+   
|  1289551 |   
+----------+

Migration process consists of three steps:

  1. Create CH table
  2. Copy existing data from MySQL to CH
  3. Setup MySQL-to-ClickHouse replication


Prerequisites

clickhouse-mysql is a Python script, so Python >=3.5 needs to be installed. 'pypy' is better from performance prospective.

Following modules are needed for MySQL and ClickHouse integrations:

  • pip install mysqlclient
  • pip install mysql-replication
  • pip install clickhouse-driver

MySQL should have binlog enabled for replication.


1. Create ClickHouse table.

clickhouse-mysql has several modes. We start from ClickHouse table creation helper.

pypy clickhouse-mysql.py --log-file=/dev/null --src-host=127.0.0.1 --src-user=root --src-only-tables=airline.ontime --table-templates

That will generate ClickHouse table DDL template that needs to be manually edited.

CREATE TABLE `airline`.`ontime` (
    `Year` Nullable(UInt16),
... CUT ...
    `Div5TailNum` Nullable(String)
) ENGINE = MergeTree(<PRIMARY_DATE_FIELD>, 
(<COMMA_SEPARATED_INDEX_FIELDS_LIST>), 8192)

Following changes need to be done manually:

  1. Specify partitioning column (<PRIMARY_DATE_FIELD>).
  2. Specify primary key columns (<COMMA_SEPARATED_INDEX_FIELDS_LIST>)
  3. Make sure partitioning and primary key columns are not nullable.

For example, it may look like

CREATE TABLE `airline`.`ontime` (
    `Year` UInt16,
...CUT...
    `Div5TailNum` Nullable(String)
) ENGINE = MergeTree(FlightDate, (FlightDate, Year, Month), 8192)

Once the table definition is ready, run clickhouse-client to create it into interactive mode or create from file.


2. Copy existing data from MySQL to ClickHouse

Data copy can be performed using standard MySQL and ClickHouse commands.

sudo mysqldump \
-u root \
--tz-utc \
--quick \
--fields-terminated-by=, \
--fields-optionally-enclosed-by=\" \
--fields-escaped-by=\\ \
--tab="$CSV_FILES_DIR"/ \
airline ontime

sudo cat "$CSV_FILES_DIR"/ontime.txt | clickhouse-client --query="INSERT INTO airline.ontime FORMAT CSV"

If there are no errors you can check the table in clickhouse-client:

:) select count() from airline.ontime;

SELECT count()
FROM airline.ontime

┌─count()─┐
│ 1289551 │
└─────────┘


3. Setup MySQL-to-ClickHouse replication

Replication log reader for ontime database can be started by an example shell script:

./examples/run_airline_ontime_data_mysql_to_ch_reader.sh

If you are curious what it does, it runs 'clickhouse-mysql' with following parameters:

pypy clickhouse-mysql \
--src-resume \
--src-wait \
--nice-pause=1 \
--log-level=info \
--log-file=ontime.log \
--src-host=127.0.0.1 \
--src-user=root \
--dst-host=127.0.0.1 \
--csvpool \
--csvpool-file-path-prefix=qwe_ \
--mempool-max-flush-interval=60 \
--mempool-max-events-num=10000

Now let's test how inserts are propagated from MySQL to ClickHouse. We will duplicate the data in MySQL.

mysql> insert into airline.ontime select * from airline.ontime;

 mysql> select count(*) from airline.ontime;
+----------+
| count(*) |
+----------+
|  2579102 |
+----------+

And check on ClickHouse side.

:) select count() from airline.ontime;

SELECT count()
FROM airline.ontime

┌─count()─┐
│ 2579102 │
└─────────┘

Inserted records were populated to ClickHouse.

Conclusion

Several months ago ProxySQL introduced ClickHouse support allowing to access ClickHouse via MySQL protocol. We took this further and demonstrated that it is possible to have MySQL data in ClickHouse in realtime using MySQL binlog reading technique. There are multiple ways to do that. Custom implementation like Altinity 'clickhouse-mysql' prototype is one approach, or more universal technologies can be used, for example Uber's StorageTapper that generates Kafka stream from MySQL binlogs that can be later inserted to ClickHouse using Kafka engine or dedicated consumer.