Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practice for integrating Kafka and HBase

what are best practices for "importing" streamed data from Kafka into HBase?

The usecase is as follows: Vehicle sensor data are streamed to Kafka. Afterwards, these sensordata must be transformed (i.e., deserialized from protobuf in humanreadable data) and stored within HBase.

1) Which toolset do you recommend (e.g., Kafka --> Flume --> HBase, Kafka --> Storm --> HBase, Kafka --> Spark Streaming --> HBase, Kafka --> HBase)

2) What is the best place for doing the protobuf deseralization (e.g., within Flume using interceptors)?

Thank you for your support.

Best, Thomas

like image 575
Thomas Beer Avatar asked Aug 18 '15 07:08

Thomas Beer


People also ask

What should you not use with Kafka?

It's best to avoid using Kafka as the processing engine for ETL jobs, especially where real-time processing is needed. That said, there are third-party tools you can use that work with Kafka to give you additional robust capabilities – for example, to optimize tables for real-time analytics.

Is it possible to integrate Kafka with another external database storage system?

It is not possible to integrate Kafka with another external database storage system. It is only possible to delete a specific partition of a topic; we cannot delete the entire topic.

Can Kafka stream database?

Kafka Streams works very well as a Java-based stream processing API, both to build scalable, standalone stream processing applications and to enrich Java applications with stream processing functionality that complements their other functions. But what if you don't have an existing commitment to Java?

Is Kafka a data integration tool?

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.


1 Answers

I think you just need to do Kafka -> Storm -> HBase.

Storm: Storm spout will subscribe to Kafka topic.
Then Storm bolts can transform the data and write it into HBase.
You can use HBase client api in java to write data to HBase from Storm.

I suggested Storm because it actually processes one tuple at a time. In Spark streaming, a micro-batch is processed. However, if you would like to use common infrastructure for Batch and Stream processing then Spark might be a good choice.

If you end up using Spark then also your flow will be Kafka -> Spark -> HBase.

like image 178
Anil Gupta Avatar answered Sep 20 '22 00:09

Anil Gupta