<p>I am trying to understand <code>hive</code> in terms of architecture, and I am referring to Tom White's book on Hadoop.</p> <p>I came across the following terms in regards to hive: <code>Hive Services</code> , <code>hiveserver2</code> , <code>metastore</code> among others.</p> <p>Referring to below diagrams from the Book (Hadoop: The definitive Guide).</p> <h3>Hive Architecture:</h3> <p><img src="https://i.stack.imgur.com/17mZy.png" alt="enter image description here"></p> <h3>MetaStore configuration:</h3> <p><img src="https://i.stack.imgur.com/vZjQY.png" alt="enter image description here"></p> <h3>Hive Architecture which shows what "Driver" is:</h3> <p><img src="https://i.stack.imgur.com/9MfZM.png" alt="enter image description here"></p> <p>I am not able to understand the following:</p> <p>1) What is <code>Hive Services</code> in Hive architecture diagram? Is it same when we say <code>hiveserver2</code>?</p> <p>2) What is <code>Driver</code> in Hive architecture diagram?</p> <p>3) What is <code>MetaStore</code> (I am <strong>NOT</strong> referring to Metastore Database). Is it some process which runs? If so, is this part of <code>hiveserver2</code> ? As per the diagram <code>MetaStore</code> can be remote, so if this is a JVM process, to which component it belongs to?</p> <p>4) It say <code>Hive service JVM</code> , <code>MetaStore JVM Server</code>. But, where do these components gets installed? Are they part of the "server" side of "hive"?</p> <p>5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".</p> <p>Can anyone help understand this?</p>

<h3>Hive Services</h3> <ul> <li>HiveServer2</li> <li>Hive Metastore</li> <li>HCatalog + WebHcat</li> <li>Beeline & Hive CLI</li> <li>Thrift client</li> <li>FileSystem :: HDFS and other compatible filesystems like S3</li> <li>Execution engine :: MapReduce, Tez, Spark </li> <li>Hive Web UI (added in Hive 2.x). Maybe also Tez or Spark UI, but not really</li> </ul> <h3>Driver</h3> <p>The JDBC/ODBC or Thrift interfaces have drivers.<br> There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver </p> <h3>Metastore Server</h3> <p>Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop). </p> <p>Supported Remote Metastore servers = Oracle, MySQL, Postgres<br> Embedded Metastore (not recommended for production) = Derby</p> <p>See Hive Wiki</p> <p><strong>Metastore JVM</strong></p> <p>The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups. </p> <p>I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?</p> <p>It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.</p> <p>HiveServer1 is deprecated. Feel free to read about it, but don't use it. </p>

Hive service, HiveServer2 & MetaStore service?

Hive Architecture:

enter image description here

MetaStore configuration:

enter image description here

Hive Architecture which shows what "Driver" is:

enter image description here

I am not able to understand the following:

1) What is Hive Services in Hive architecture diagram? Is it same when we say hiveserver2?

2) What is Driver in Hive architecture diagram?

3) What is MetaStore (I am NOT referring to Metastore Database). Is it some process which runs? If so, is this part of hiveserver2 ? As per the diagram MetaStore can be remote, so if this is a JVM process, to which component it belongs to?

4) It say Hive service JVM , MetaStore JVM Server. But, where do these components gets installed? Are they part of the "server" side of "hive"?

5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".

Can anyone help understand this?

769

asked Apr 12 '18 14:04

CuriousMind

1 Answers

Hive Services

HiveServer2
Hive Metastore
HCatalog + WebHcat
Beeline & Hive CLI
Thrift client
FileSystem :: HDFS and other compatible filesystems like S3
Execution engine :: MapReduce, Tez, Spark
Hive Web UI (added in Hive 2.x). Maybe also Tez or Spark UI, but not really

Driver

The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver

Metastore Server

Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).

Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby

See Hive Wiki

Metastore JVM

The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.

I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?

It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.

HiveServer1 is deprecated. Feel free to read about it, but don't use it.

185

answered Sep 23 '22 04:09

OneCricketeer

Related questions
                            
                                Mapper input Key-Value pair in Hadoop
                            
                                Hadoop 2.2.0 : "name or service not known" Warning
                            
                                How to get ID of a map task in Spark?
                            
                                hadoop fs -du gives two data columns
                            
                                org.apache.hadoop.mapred.FileAlreadyExistsException
                            
                                error in namenode starting
                            
                                Hadoop YARN: Get a list of available queues
                            
                                How to connect to Hadoop/Hive from .NET
                            
                                Hive ParseException - cannot recognize input near 'end' 'string'
                            
                                How do you retrieve the replication factor info in Hdfs files?
                            
                                What is the difference between single node & pseudo-distributed mode in Hadoop?
                            
                                How to open/stream .zip files through Spark?
                            
                                How to output multiple s3 files in Parquet
                            
                                Unable to load native hadoop library for Mac OS X
                            
                                Define tuple datas in the pig script
                            
                                How do I submit more than one job to Hadoop in a step using the Elastic MapReduce API?
                            
                                Using Hadoop for Parallel Processing rather than Big Data
                            
                                Filtering null values with pig
                            
                                What is the meaning of 'serialization.format' property of a table in hive
                            
                                How to unzip file in hadoop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hive service, HiveServer2 & MetaStore service?

Tags:

hadoop

hive

hive-metastore

Hive Architecture:

MetaStore configuration:

Hive Architecture which shows what "Driver" is:

CuriousMind

People also ask

1 Answers

Hive Services

Driver

Metastore Server

OneCricketeer

Recent Activity

Donate For Us