Spark with HBASE vs Spark with HDFS

Tags:

I know that HBASE is a columnar database that stores structured data of tables into HDFS by column instead of by row. I know that Spark can read/write from HDFS and that there is some HBASE-connector for Spark that can now also read-write HBASE tables.

Questions:

1) What are the added capabilities brought by layering Spark on top of HBASE instead of using HBASE solely? It depends only on programmer capabilities or is there any performance reason to do that? Are there things Spark can do and HBASE solely can't do?

2) Stemming from previous question, when you should add HBASE between HDFS and SPARK instead of using directly HDFS?

610

asked Aug 13 '16 08:08

Johan

1 Answers

1) What are the added capabilities brought by layering Spark on top of HBASE instead of using HBASE solely? It depends only on programmer capabilities or is there any performance reason to do that? Are there things Spark can do and HBASE solely can't do?

At Splice Machine, we use Spark for our analytics on top of HBase. HBase does not have an execution engine and spark provides a competent execution engine on top of HBase (Intermediate results, Relational Algebra, etc.). HBase is a MVCC storage structure and Spark is an execution engine. They are natural complements to one another.

2) Stemming from previous question, when you should add HBASE between HDFS and SPARK instead of using directly HDFS?

Small reads, concurrent write/read patterns, incremental updates (most etl)

Good luck...

answered Oct 31 '22 16:10

John Leach

Related questions
                            
                                Hadoop MapReduce: Possible to define two mappers and reducers in one hadoop job class?
                            
                                What is the usage of Configured class in Hadoop programs?
                            
                                Group by multiple fields and output tuple
                            
                                Get error when I run Hbase shell
                            
                                Write and read raw byte arrays in Spark - using Sequence File SequenceFile
                            
                                Accessing a file that is being written
                            
                                pom.xml for Hadoop 2.6.0
                            
                                Hadoop on Windows Building/ Installation Error
                            
                                Parquet predicate pushdown
                            
                                Hadoop Hive web interface options
                            
                                How does Hive decide when to use map reduce and when not to?
                            
                                Requests hang when using Hiveserver2 Thrift Java client
                            
                                Hive Buckets-understanding TABLESAMPLE(BUCKET X OUT OF Y)
                            
                                Messed up sed syntactics in hadoop startup script after reinstalling JVM
                            
                                build hadoop 2.2 on windows
                            
                                HDFS file watcher
                            
                                Tuning Hive Queries That Uses Underlying HBase Table
                            
                                Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password) during ambari hadoop installation
                            
                                Concat Avro files using avro-tools
                            
                                Is there a way to transpose data in Hive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark with HBASE vs Spark with HDFS

Tags:

apache-spark

hadoop

hbase

hdfs

Johan

People also ask

1 Answers

John Leach

Recent Activity

Donate For Us