Upcoming ClickHouse talk at ApacheCon:
Processing CDN Logging Events at Scale
by Geoff Genz, Comcast
'Content Delivery Networks generate staggering amounts of event data at a rate that is exponentially increasing in line with the explosion of IP video. Managing this deluge with the usual suspects of Big Data platforms presents a formidable challenge, particularly in light of the often-competing goals of detailed troubleshooting investigation versus big picture analytics and machine learning. This presentation will explore capturing and processing events from the largest production deployment of Apache Traffic Server and Apache Traffic Control. We will examine some of the drawbacks of standard approaches, including Splunk, the Elastic stack, Hadoop, and time series and OLAP datastores. This will be followed by a deep dive into Comcast’s rapidly evolving MAPLE platform. MAPLE leverages the open source Clickhouse “warp speed” database, along with Apache staples Kafka and Zookeeper, to collect and transform more than two million CDN events per second. This firehose of data is available within seconds at both an individual event and aggregate level for alerting, troubleshooting, analytics, and machine learning.n'