Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Relationship between Hive and Hadoop MapReduce?

Is there any Hive internal process that connects to reduce or map tasks?

Adding to that!

  • How does Hive work in relation with MapReduce?
  • How is the job getting scheduled?
  • How does the query result return to the hive driver?
like image 847
arya Avatar asked Nov 09 '16 15:11

arya


People also ask

What is the relationship between hive and MapReduce?

An SQL query gets converted into a MapReduce app by going through the following process: The Hive client or UI submits a query to the driver. The driver then submits the query to the Hive compiler, which generates a query plan and converts the SQL into MapReduce tasks.

What is the relationship between MapReduce and Hadoop?

The Apache Hadoop is an eco-system which provides an environment which is reliable, scalable and ready for distributed computing. MapReduce is a submodule of this project which is a programming model and is used to process huge datasets which sits on HDFS (Hadoop distributed file system).

What is the relation between hive and Hadoop?

Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data.

Does hive use MapReduce?

Hive is an initiative started by Facebook to provide a traditional Data Warehouse interface for MapReduce programming. For writing queries for MapReduce in SQL fashion, the Hive compiler converts them in the background to be executed in the Hadoop cluster.


1 Answers

For HIVE there is no process to communicate Map/Reduce tasks directly. It's communicates (flow 6.3) with Jobtracker(Application Master in YARN) only for job processing related things once it got scheduled.


This image will give clear understanding about,

  1. How HIVE uses MapReduce as execution engine?
  2. How is the job getting scheduled?
  3. How does the result return to the driver?

Hive design

Edit: suggested by dennis-jaheruddin

Hive is typically controlled by means of HQL (Hive Query Language) which is often conveniently abbreviated to Hive.

source

like image 81
mrsrinivas Avatar answered Oct 05 '22 22:10

mrsrinivas