One requirement of the PoC was to use Hortonworks Hadoop. Within these specifications, we were free to choose the frameworks. All frameworks used are included in the standard installation of Hortonworks Hadoop.
Kafka
Kafka was used to load the data from the sensors into the Hadoop environment. Kafka is a distributed streaming platform that is ideal for real-time streaming applications. Kafka is also compact and fault-tolerant.
Flume
In order for the data from Kafka to be available within the Hadoop file system, it must be loaded from the so-called Kafka topic into the HDFS. Flume is used for this.
Hive
As access from the visualisation frameworks is easier if the queries can run via SQL queries, Hive is used. Hive allows the data in HDFS to be queried using the familiar SQL syntax.
Visualisierungsframework
The visualisation framework that the user feels most comfortable with can be used, but care should be taken to ensure that the framework is compatible with Hive. Zeppelin was used for the PoC. Other options are SAP Lumira Discovery, Tableau, etc.