Best practice for integrating Kafka and HBase

1 Answers

I think you just need to do Kafka -> Storm -> HBase.

Storm: Storm spout will subscribe to Kafka topic.
Then Storm bolts can transform the data and write it into HBase.
You can use HBase client api in java to write data to HBase from Storm.

I suggested Storm because it actually processes one tuple at a time. In Spark streaming, a micro-batch is processed. However, if you would like to use common infrastructure for Batch and Stream processing then Spark might be a good choice.

If you end up using Spark then also your flow will be Kafka -> Spark -> HBase.

178

answered Sep 20 '22 00:09

Anil Gupta

Related questions
                            
                                How does Round Robin partitioning in Spark work?
                            
                                Why does Spark groupBy.agg(min/max) of BigDecimal always return 0?
                            
                                Submitting pyspark script to a remote Spark server?
                            
                                What's the purpose of OutputMode in flatMapGroupsWithState? How/where is it used?
                            
                                List all additional jars loaded in pyspark
                            
                                pyspark 'DataFrame' object has no attribute '_get_object_id'
                            
                                Using partitions (with partitionBy) when writing a delta lake has no effect
                            
                                Why joining structure-identic dataframes gives different results?
                            
                                Spark processing columns in parallel
                            
                                How to run script in Pyspark and drop into IPython shell when done?
                            
                                how to run python script in spark job?
                            
                                spark scalability: what am I doing wrong?
                            
                                how to collect spark sql output to a file?
                            
                                How to save/export a Spark ML Lib model to PMML?
                            
                                Concurrent job Execution in Spark
                            
                                Equivalent of Distributed Cache in Spark? [duplicate]
                            
                                Spark MLlib: building classifiers for each data group
                            
                                What are the best practices to partition Parquet files by timestamp in Spark?
                            
                                Get a range of columns of Spark RDD
                            
                                Ever increasing physical memory for a Spark application in YARN

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best practice for integrating Kafka and HBase

Tags:

apache-kafka

apache-spark

hbase

apache-storm

flume

Thomas Beer

People also ask

1 Answers

Anil Gupta

Recent Activity

Donate For Us