what are best practices for "importing" streamed data from Kafka into HBase?
The usecase is as follows: Vehicle sensor data are streamed to Kafka. Afterwards, these sensordata must be transformed (i.e., deserialized from protobuf in humanreadable data) and stored within HBase.
1) Which toolset do you recommend (e.g., Kafka --> Flume --> HBase, Kafka --> Storm --> HBase, Kafka --> Spark Streaming --> HBase, Kafka --> HBase)
2) What is the best place for doing the protobuf deseralization (e.g., within Flume using interceptors)?
Thank you for your support.
Best, Thomas
It's best to avoid using Kafka as the processing engine for ETL jobs, especially where real-time processing is needed. That said, there are third-party tools you can use that work with Kafka to give you additional robust capabilities – for example, to optimize tables for real-time analytics.
It is not possible to integrate Kafka with another external database storage system. It is only possible to delete a specific partition of a topic; we cannot delete the entire topic.
Kafka Streams works very well as a Java-based stream processing API, both to build scalable, standalone stream processing applications and to enrich Java applications with stream processing functionality that complements their other functions. But what if you don't have an existing commitment to Java?
Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.
I think you just need to do Kafka -> Storm -> HBase.
Storm: Storm spout will subscribe to Kafka topic.
Then Storm bolts can transform the data and write it into HBase.
You can use HBase client api in java to write data to HBase from Storm.
I suggested Storm because it actually processes one tuple at a time. In Spark streaming, a micro-batch is processed
. However, if you would like to use common infrastructure for Batch and Stream processing then Spark might be a good choice.
If you end up using Spark then also your flow will be Kafka -> Spark -> HBase.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With