HiBench
- Benchmark suite (51)
- Inhouse/On-Premise (44)
- Cloud (55)
- Propietary (20)
- Execution log (18)
- Gigabytes (40)
- Terabytes (39)
- Petabytes (23)
- Exabytes (17)
- Fault tolerance (18)
- Variability (3)
- Execution time (71)
- Throughput (42)
- CPU and Memory (29)
- Hybrid (28)
- Tables, files or structured data (49)
- Text data (26)
- Graphs or linked data (30)
- Structured text (21)
- Online transaction processing(OLTP) (25)
- Online analytical processing(OLAP) (27)
- Hybrid transaction/analytical processing (HTAP) (14)
- Kernel methods (6)
- Clustering (9)
- Bayesian and Neural Networks (8)
- Distributed File System (24)
- Distributed (49)
- Spark (19)
- Flink (9)
- Batch (35)
- Stream (31)
- Data pipeline (37)
- CPU and Memory (21)
- MapReduce (15)
- Linear Regression (9)
- Classification (13)
- Graph Databases (18)
- Data analytics (59)
- Data processing architectures (53)
- Data Acquisition/Collection (70)
- Data Storage/Preparation (77)
- Data Analytics (56)
- Stream processors (24)
- ETL (11)
- Relational DBMS (20)
- NoSQL (33)
- Data Warehouse (23)
- Graph (19)
- AI & ML (47)
Description
A comprehensive benchmark suite consisting of multiple workloads including both synthetic micro-benchmarks and real-world applications. HiBench features several ready-to-use benchmarks from 4 categories: micro benchmarks, Web search, Machine Learning, and HDFS benchmarks. It is used for both stream and batch processing
Web references
https://github.com/Intel-bigdata/HiBench
http://www.odbms.org/wp-content/uploads/2014/07/hibench-wbdb2012-updated.pdf
Date of last description update
31.01.2018
Originating group
Intel
Time – first version, last version
2009 – 2019
Type/Domain
Benchmark Suite
Workload
Micro-benchmark suite including 6 categories which are micro, ML (machine learning), SQL, graph, websearch and streaming.
Data type and generation/datasets
Most workloads use synthetic data generated from real data samples. The workloads use structured and semi-structured data, including graph, network, text and web data types.
Technology stack and implementation
HiBench can be executed in Docker containers. It is implemented using the following technologies: (1) Hadoop: Apache Hadoop 2.x, CDH5, HDP; (2) Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x; (3) Flink: 1.0.3; (4) Storm: 1.0.1; (5) Gearpump: 0.8.1; and (6) Kafka: 0.8.2.2.
Metrics
The measured metrics are execution time (latency), throughput and system resource utilizations (CPU, Memory, etc.).
Reported results
--
Reference papers