Purpose of using HBase in Hadoop instead of Hive [duplicate]

Question

In my project, we are using Hadoop 2, Spark, Scala. Scala is the programming language and Spark is using here for analysing. we are using Hive and HBase both. I can access all details like file etc. of HDFS using Hive. But my confusions are -

When I can able to performed all jobs using Hive, Then why HBase is required to store the data. Is it not an overhead?
What are the functionality of HIVE and HBase?
If we only used Hive, Then what should be the problem?

Can anyone please let me know.

Indrajit Swain · Accepted Answer

When I can able to performed all jobs using Hive, Then why HBASE is required to store the data. Is it not a overhead?

What are the functionality of Hive and Hbase

HBase is No Sql database which stores the data in key value pair. Hive has integration with Hbase.Hbase HIve Integration

Advantage :- Hive queries over HBase. Think joins and a easy way to do aggregates and simple operations on your Hbase data. Hbase gives you a scalable storage infrastructure that keeps data online. StumbleUpon uses Hbase for their live website. Hive is not a real-time query engine, so its data store could not be used for similar purposes. Hive over HBase gives you the benefit of both worlds.

If we only used Hive, Then what should be the problem?

If we will use Hive There is no problem . But in project there so many scenarios we have to consider .

Performance
Storage
Stability of used technology
Compatibility (Hive ware house is easily accessible for most of the Tools in Hadoop)

mrsrinivas · Answer

When I can able to performed all jobs using Hive, Then why HBase is required to store the data. Is it not an overhead?

I can't say it's overhead or not. But HBase responds to requests in real-time as its database when it comes to Hive it runs jobs on MapReduce/Spark/Tez engines.

What are the functionality of Hive and HBase?

Hive:

It's a SQL-like language that gets translated into MapReduce/Spark/Tez jobs. it only runs batch processes on Hadoop. for more check this how Hive queries run on MapReduce engine

HBase:

It's key/value store database which runs on top of HDFS/S3(on AWS). It does real-time operations for requests.

If we only used Hive, Then what should be the problem?

As discussed If the query needs to process in real-time then HBase is the choice over Hive.

Purpose of using HBase in Hadoop instead of Hive [duplicate]

Tags:

hadoop

hive

hbase

hdfs

Avijit

2 Answers

Indrajit Swain

mrsrinivas

Recent Activity

Donate For Us

Purpose of using HBase in Hadoop instead of Hive [duplicate]

Tags:

hadoop

hive

hbase

hdfs

Avijit

2 Answers

Indrajit Swain

mrsrinivas

Related questions

Recent Activity

Donate For Us