Altinity

Blog

ClickHouse vs Amazon RedShift Benchmark #2: STAR2002 dataset

Preface

We continue to benchmark ClickHouse against other column-based storages. Here we will make the another test against Amazon Redshift using different dataset and queries.

For this particular benchmark we will be loading 1TB CSV dataset. The dataset is based on STAR2002 experiment data repeated 500 times. We are going to compare ClickHouse results with the benchmark described in GCE BigQuery vs AWS Redshift vs AWS Athena article, where RedShift has been tested in two different configurations.

The source dataset can be downloaded from here and needs to be replicated 500 times. At the end we have the following dataset:

  • CSV size: 997GB (~1TB)
  • 7 928 812 500 lines (~8 billion)
  • 16 columns
  • All columns are either integers, double precision or floats

The tested configuration is Amazon d2.xlarge EC2 instances with ClickHouse installed.

Data Load

CSV file has been pre-copied to the ClickHouse server. The data loading process took 1 hour and 40 minutes to complete. It is 6 times faster than it took for Redshift to load data from S3.

Performance Benchmark

We use the same queries as in the mentioned article:

  1. SELECT count(*) FROM t
  2. SELECT count(*) FROM t WHERE eventnumber > 1
  3. SELECT count(*) FROM t WHERE eventnumber > 20000
  4. SELECT count(*) FROM t WHERE eventnumber > 500000
  5. SELECT eventFile, count(*) FROM t GROUP BY eventFile
  6. SELECT eventFile, count(*) FROM t WHERE eventnumber > 525000 GROUP BY eventFile
  7. SELECT eventFile, eventTime, count(*) FROM t WHERE eventnumber > 525000 GROUP BY eventFile, eventTime ORDER BY eventFile DESC, eventTime ASC
  8. SELECT MAX(runNumber) FROM t
  9. SELECT AVG(eventTime) FROM t WHERE eventnumber > 20000
  10. SELECT eventFile, AVG(eventTime), AVG(multiplicity), MAX(runNumber), count(*) FROM t WHERE eventnumber > 20000 GROUP BY eventFile

alt

The results demonstrate that ClickHouse performs much faster than Redshift on the same hardware and is comparable to much more expensive one.

Conclusion

ClickHouse showed the great results with the different dataset and different use cases. It is also interesting to compare prices for Redshift and ClickHouse instances:

RedShift:

  • $0.9 per working hour for ds2.xlarge instance, ~$650 per month
  • $5.7 per working hour for dc1.8xlarge instance, ~$4100 per month

ClickHouse:

  • $0.266 per working hour for d2.xlarge instance, ~$190 per month.