HiBench

Description

A comprehensive benchmark suite consisting of multiple workloads including both synthetic micro-benchmarks and real-world applications. HiBench features several ready-to-use benchmarks from 4 categories: micro benchmarks, Web search, Machine Learning, and HDFS benchmarks. It is used for both stream and batch processing

Web references

https://github.com/Intel-bigdata/HiBench

http://www.odbms.org/wp-content/uploads/2014/07/hibench-wbdb2012-updated.pdf

Date of last description update

31.01.2018

Originating group

Intel

Time – first version, last version

2009 – 2019

Type/Domain

Benchmark Suite

Workload

Micro-benchmark suite including 6 categories which are micro, ML (machine learning), SQL, graph, websearch and streaming.

Data type and generation/datasets

Most workloads use synthetic data generated from real data samples. The workloads use structured and semi-structured data, including graph, network, text and web data types.

Technology stack and implementation

HiBench can be executed in Docker containers. It is implemented using the following technologies: (1) Hadoop: Apache Hadoop 2.x, CDH5, HDP; (2) Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x; (3) Flink: 1.0.3; (4) Storm: 1.0.1; (5) Gearpump: 0.8.1; and (6) Kafka: 0.8.2.2.

Metrics

The measured metrics are execution time (latency), throughput and system resource utilizations (CPU, Memory, etc.).

Reported results

--

Reference papers

Huang, Shengsheng, et al. "The HiBench benchmark suite: Characterization of the MapReduce-based data analysis."