Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Purpose of using HBase in Hadoop instead of Hive [duplicate]

In my project, we are using Hadoop 2, Spark, Scala. Scala is the programming language and Spark is using here for analysing. we are using Hive and HBase both. I can access all details like file etc. of HDFS using Hive. But my confusions are -

  1. When I can able to performed all jobs using Hive, Then why HBase is required to store the data. Is it not an overhead?
  2. What are the functionality of HIVE and HBase?
  3. If we only used Hive, Then what should be the problem?

Can anyone please let me know.

like image 870
Avijit Avatar asked Dec 24 '22 22:12

Avijit


2 Answers

  1. When I can able to performed all jobs using Hive, Then why HBASE is required to store the data. Is it not a overhead?
  2. What are the functionality of Hive and Hbase

HBase is No Sql database which stores the data in key value pair. Hive has integration with Hbase.Hbase HIve Integration

Advantage :- Hive queries over HBase. Think joins and a easy way to do aggregates and simple operations on your Hbase data. Hbase gives you a scalable storage infrastructure that keeps data online. StumbleUpon uses Hbase for their live website. Hive is not a real-time query engine, so its data store could not be used for similar purposes. Hive over HBase gives you the benefit of both worlds.

  1. If we only used Hive, Then what should be the problem?

If we will use Hive There is no problem . But in project there so many scenarios we have to consider .

  • Performance
  • Storage
  • Stability of used technology
  • Compatibility (Hive ware house is easily accessible for most of the Tools in Hadoop)
like image 103
Indrajit Swain Avatar answered Jan 26 '23 04:01

Indrajit Swain


When I can able to performed all jobs using Hive, Then why HBase is required to store the data. Is it not an overhead?

I can't say it's overhead or not. But HBase responds to requests in real-time as its database when it comes to Hive it runs jobs on MapReduce/Spark/Tez engines.

What are the functionality of Hive and HBase?

Hive:

It's a SQL-like language that gets translated into MapReduce/Spark/Tez jobs. it only runs batch processes on Hadoop. for more check this how Hive queries run on MapReduce engine

HBase:

It's key/value store database which runs on top of HDFS/S3(on AWS). It does real-time operations for requests.

If we only used Hive, Then what should be the problem?

As discussed If the query needs to process in real-time then HBase is the choice over Hive.

like image 36
mrsrinivas Avatar answered Jan 26 '23 06:01

mrsrinivas