Mar 12, 2018
What Is ClickTail
Altinity is happy to introduce ClickTail -- an easy to use tool for ingesting log file data into ClickHouse and making it available for exploration. Its favorite format is JSON, but it understands how to parse a range of other well-known log formats. ClickTail has been created as a fork from HoneyTail project developed by HoneyComb.io.
Project information could be found on GitHub here: https://github.com/Altinity/clicktail
What Is It Made For
ClickTail can parse different types of log files, including most popular ones as Nginx access logs, MySQL slow and audit logs. Once parsed it loads data into ClickHouse DBMS for further analysis.
There is a number of similar tools available already, that can parse logs and put them into some data stores, for example, Logstash, HoneyComb, ApacheLogsParser, etc. So why another one? ClickTail does not only use ClickHouse as the data store so the log data analysis works blazingly fast, but it also does a job of normalizing text data from the log files such as sql query or URL in respective log formats. Normalizing means replacing the actual queries/urls with a pattern which could be used as a filter or group by clause afterward.
ClickTail is written in Go so it is natively fast, and it is implemented in a parallel manner in order to utilize CPU in a most efficient way.
How to Use It
First of all, ClickTail could be installed from rpm/deb packages as well as build from the sources published on github. There is an Altinity’s package repository here: https://packagecloud.io/Altinity/clickhouse and here is the source code repo: https://github.com/Altinity/clicktail
# curl -s https://packagecloud.io/install/repositories/Altinity/clickhouse/script.deb.sh | bash # apt-get install clicktail
Once ClickTail is installed one could use it as a CLI tool right away or configure a service and run it in a background.
ClickTail supports reading files from STDIN as well as from a file on disk. A complete list of parsers can be found in the parsers/ directory. At the time of this writing, ClickTail supports logs generated by:
- MySQL (Slow and Audit Logs)
- Nginx (Access Logs)
Let’s look into a couple of example of how to configure ClickTail for specific log types.
Nginx Access Logs
In this example, we will use 1Gb access log file (approx. 3.4M rows) from typical Nginx server.
We start with db schema preparations. We use table definitions from our repository located here: https://github.com/Altinity/clicktail/blob/master/schema/
We assume ClickHouse server and client are installed on the same server we run ClickTail. First we create a separate database
$ cat schema/db.sql | clickhouse-client --multiline
Then create clicktail.nginx_log table:
$ cat schema/nginx.sql | clickhouse-client --multiline
Once the schema in ClickHouse is prepared let’s run ClickTail by telling it to parse existing log file using
$ clicktail -p nginx -f ./access.log -d clicktail.nginx_log --nginx.conf=/etc/nginx/nginx.conf --nginx.format=combined --backfill
As you can see there are two unexplained options:
nginx.format -- those two are used to identify nginx log format in order to parse log entries.
In our test server it took about 5 minutes to load 1Gb log file into ClickHouse.
After ClickTail is done with parsing the log file we can start executing some queries using clickhouse-client. Here are few example queries.
The first one queries the number of requests with successful responses (20x-30x HTTP response codes) and a number of requests with errors (40x-50x codes respectively):
SELECT if(status >= 400, '40x', 'ok'), count() FROM nginx_log GROUP BY if(status >= 400, '40x', 'ok') FORMAT TSVRaw
ok 3399777 40x 39649 2 rows in set. Elapsed: 0.027 sec. Processed 3.44 million rows, 13.76 MB (125.19 million rows/s., 500.74 MB/s.)
So here we get a result in less than a tenth of a second by processing about 3.5M records.
Here is another query example which gets top 10 most requested pages excluding some static content:
SELECT request_shape, count() FROM nginx_log WHERE (request_shape NOT LIKE '%.js%') AND (request_shape NOT LIKE '%.css%') AND (request_shape NOT LIKE '%.png%') AND (request_shape NOT LIKE '%.gif%') AND (request_shape NOT LIKE '%.jpg') GROUP BY request_shape ORDER BY count() DESC LIMIT 10 ┌─request_shape─────────────────────────────────────────────────────────────────────────┬─count()─┐ │ /services/ │ 566945 │ │ /robots.txt │ 406599 │ │ / │ 181174 │ │ /wp-login.php │ 171027 │ │ /blog/using-sphinx-search-engine-with-chinese-japanese-and-korean-language-documents/ │ 49972 │ │ /feed/ │ 46878 │ │ /joombah/index.php?format=?&option=? │ 31458 │ │ /contact-us/ │ 14029 │ │ /check_ivinco_contact_form.php │ 11001 │ │ /software/wordpress-sphinx-search/ │ 10383 │ 10 rows in set. Elapsed: 0.097 sec. Processed 3.44 million rows, 160.58 MB (35.38 million rows/s., 1.65 GB/s.)
Here we used
request_shape column instead of original
request attribute in order to match requests using uri format like this:
/joombah/index.php?format=?&option=? where question marks are located in places where actual parameters were in the original queries.
MySQL Slow Logs
ClickTail parses all the attributes from mysql-slow, nginx access and mysql-audit log file including some extra attributes that only Percona Server provides. That means so whatever comes in log file could be used as a filter, group or order by field in ClickHouse.
Here we will see how one can start parsing mysql slow log files. First of all, we need to create a schema in ClickHouse where parsed log entries will be stored:
$ cat schema/mysql.sql | clickhouse-client --multiline
Then if we need to get already logged entries from the log file we run ClickTail with
$ clicktail -p mysql -f /var/lib/mysql/centos-slow.log -d clicktail.mysql_slow_log --backfill
After that we can start ClickTail to watch for new entries simply by removing
$ clicktail -p mysql -f /var/lib/mysql/centos-slow.log -d clicktail.mysql_slow_log
Once we have log file parsed into ClickHouse table we can start querying the data. Here is a simple query to get a number of log entries, avg and max query time grouped by date:
SELECT toDate(_date) AS t, count(), max(query_time), avg(query_time) FROM mysql_slow_log GROUP BY t ORDER BY t ASC ┌──────────t─┬─count()─┬─max(query_time)─┬───────avg(query_time)─┐ │ 2018-02-21 │ 31920 │ 4.3042603 │ 0.008569910840723489 │ │ 2018-02-22 │ 70135 │ 8.375134 │ 0.013615735582736797 │ │ 2018-03-12 │ 2446 │ 0.058650002 │ 0.0003342101415163659 │ └────────────┴─────────┴─────────────────┴───────────────────────┘ 3 rows in set. Elapsed: 0.005 sec. Processed 104.50 thousand rows, 627.01 KB (19.55 million rows/s., 117.29 MB/s.)
Here is one more query to get most used query share from the whole set of queries:
SELECT normalized_query, count() AS c, avg(query_time) FROM mysql_slow_log GROUP BY normalized_query ORDER BY c DESC LIMIT 3 ┌─normalized_query──────────────────────────────────────────────┬─────c─┬────────avg(query_time)─┐ │ select `competitionteamdriver`.* from `competitionteamdriver` │ 56805 │ 0.00009646791773688537 │ │ select `driver`.* from `driver` where lower(`link`) = ? │ 14055 │ 0.049214935139197974 │ │ select `top25class`.* from `top25class` │ 8062 │ 0.00009259848763351935 │ └───────────────────────────────────────────────────────────────┴───────┴────────────────────────┘ 3 rows in set. Elapsed: 0.011 sec. Processed 104.50 thousand rows, 9.22 MB (9.82 million rows/s., 866.00 MB/s.)
For more examples you may refer to a recent article of Peter Zaitsev from Percona, that highlights MySQL slow logs analysis with ClickTail and ClickHouse: https://www.percona.com/blog/2018/02/28/analyze-raw-mysql-query-logs-clickhouse/
Create a Custom Log Format Parser
ClickTail is an open source and easily extensible. It is possible to create a custom log format parser and start using all of ClickTail’s benefits and features with your own type of logs.
In order to achieve that, first you need to checkout project sources: https://github.com/Altinity/clicktail
The next step is to create a parser class and respective package. To do so one needs to create a subfolder under ‘parsers’ folder and create a package file there. Name of the file should match package name and the subfolder as well, i.e.:
$ cd parsers $ mkdir customlog $ echo “package customlog” > customlog/customlog.go
See other parsers in ‘parsers’ folder as an example and guidance of how to make interfaces and structures for custom log parser.
Once class and package for custom parser are ready, the new parser needs to be enabled in order to be used at runtime. It can be done in the following files (with their respective sections):
- leash.go: import section and getParserAndOptions function;
- main.go: import section, validParsers array and GlobalOptions struct.
The last step is to create a table schema in ClickHouse describing attributes of custom log format.