Is there any Hive internal process that connects to reduce or map tasks?
Adding to that!
An SQL query gets converted into a MapReduce app by going through the following process: The Hive client or UI submits a query to the driver. The driver then submits the query to the Hive compiler, which generates a query plan and converts the SQL into MapReduce tasks.
The Apache Hadoop is an eco-system which provides an environment which is reliable, scalable and ready for distributed computing. MapReduce is a submodule of this project which is a programming model and is used to process huge datasets which sits on HDFS (Hadoop distributed file system).
Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data.
Hive is an initiative started by Facebook to provide a traditional Data Warehouse interface for MapReduce programming. For writing queries for MapReduce in SQL fashion, the Hive compiler converts them in the background to be executed in the Hadoop cluster.
For HIVE there is no process to communicate Map/Reduce tasks directly. It's communicates (flow 6.3) with Jobtracker
(Application Master in YARN) only for job processing related things once it got scheduled.
This image will give clear understanding about,
- How HIVE uses MapReduce as execution engine?
- How is the job getting scheduled?
- How does the result return to the driver?
Edit: suggested by dennis-jaheruddin
Hive is typically controlled by means of HQL (Hive Query Language) which is often conveniently abbreviated to Hive.
source
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With