Tableau with ClickHouse
July 19, 2018
ClickHouse users often require data to be accessed in a user-friendly way. There is a number of tools that can display big data using visualization effects, charts, filters, etc. Tableau is one of the popular ones. In this article, we will explain how Tableau can be used with ClickHouse data source.
For the purpose of this article, we will use a free trial distribution of the desktop version for Windows platform (https://www.tableau.com/products/desktop).
So download it by specifying your email address and install it once a download is complete. It will require to fill up the form before you will start using it.
In order to fetch the data from ClickHouse server, it is required to install and configure the proper driver for that particular connection. This driver could be found on Yandex github repo here:
Pick a correct version of it and install it.
Once it’s installed you can configure it using Windows ODBC Admin Tool called “ODBC Data Sources (XX-bit)”. Important Note: Instead of XX use the same number as you have selected for connection driver.
Click Add… button
Pick the correct driver
After these steps restart Tableau application if it was running.
Here is another article on how to do it, made by MySQL Team: https://dev.mysql.com/doc/connector-odbc/en/connector-odbc-configuration-dsn-windows-5-2.html
Add data source using ODBC driver. On the main screen of Tableau application, there is a list of supported types of Databases. We pick the one that says “Other Databases (ODBC)” and select a previously configured ClickHouse server connection from the list of DSNs.
The next step is to setup data source using a selected connection.
It is as simple as it gets. Just click the Search button (a magnifier icon) and the application will connect to the database and fetch the list of available tables. From that list, we can either drag and drop one table to the work area or double-click on it to get the same results.
We can add more tables and configure joins right away to prepare a complicated data source but for now, we leave only one table there.
Once data source setup is complete we can start doing some stuff with the actual data.
Custom SQL Definition
In the example above we used a simple table pick from the list of available tables. What if we have a more complicated use case with extra fields or joins? In that case, we could use a custom query definition. So here is a simple definition of a query with an additional field picked from an external dictionary.
Here we define a new field with alias ‘t’ along with other fields in the table ‘yellow_tripdata’ which will contain values of ‘max_temperature’ from an external dictionary called ‘weather’.
And now we can pick this extra field from dictionary right to our Measures and into the chart. We're going to take a closer look at setting up Dimensions, Measures, Filters in the next section.
We are going to use one of the public datasets based on flights activity in the US. This dataset is also used by Yandex in their ClickHouse documentation.
Basic info about the dataset: it has one table with 30+ columns representing flight departure and arrival date time, distance between origin and destination spots, travel duration, etc.
As for our use case, let's state that we need to measure dynamics of the number of flights, complete from California to New York in a period of time. Also, let’s check whether the average distance was the same for all flights.
We have configured a data source, so we can fetch data from only one table. Let’s now make a chart that displays the number of flights over a period of time.
In order to do so in Tableau, one should go to a “Sheet” window of the current Workbook first.
On a brand new sheet you can see page areas for columns and rows definitions. Add a Dimension to the Column area (drag and drop) and Tableau will divide fetched data by the values from that Dimension. By adding a Measure to the Rows area you will get different metric values grouped by added Dimension.
In our case, we picked WEEK(FlightDate) as Column Dimension and we picked two Measures: Distance and COUNT(*) - which is basically Number of Flights for that period.
Now we have automatically generated and live chart that fetches data from our ClickHouse data source and displays all that in a visual way.
Everything is set and we can add some Filters. As we said before we need to filter out only those flights that have flown from California to New York state. To do so we can simply drag and drop Dimensions "Origin State" and "Dest State" respectively to the Filters area of the Sheet. When we pull one of our filters, Tableau will show a dialog where we can setup values for that particular filter.
And here we set up both filters:
So here it is! Just a few clicks and the data from ClickHouse is displayed in a user-friendly way in Tableau.