A comprehensive benchmark suite consisting of multiple workloads including both synthetic micro-benchmarks and real-world applications. HiBench features several ready-to-use benchmarks from 4 categories: micro benchmarks, Web search, Machine Learning, and HDFS benchmarks. It is used for both stream and batch processing

Web references



Date of last description update


Originating group


Time – first version, last version

2009 – 2019


Benchmark Suite


Micro-benchmark suite including 6 categories which are micro, ML (machine learning), SQL, graph, websearch and streaming.

Data type and generation/datasets

Most workloads use synthetic data generated from real data samples. The workloads use structured and semi-structured data, including graph, network, text and web data types.

Technology stack and implementation

HiBench can be executed in Docker containers. It is implemented using the following technologies: (1) Hadoop: Apache Hadoop 2.x, CDH5, HDP; (2) Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x; (3) Flink: 1.0.3; (4) Storm: 1.0.1; (5) Gearpump: 0.8.1; and (6) Kafka:


The measured metrics are execution time (latency), throughput and system resource utilizations (CPU, Memory, etc.).

Reported results


Reference papers

Huang, Shengsheng, et al. "The HiBench benchmark suite: Characterization of the MapReduce-based data analysis."