Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hive service, HiveServer2 & MetaStore service?

I am trying to understand hive in terms of architecture, and I am referring to Tom White's book on Hadoop.

I came across the following terms in regards to hive: Hive Services , hiveserver2 , metastore among others.

Referring to below diagrams from the Book (Hadoop: The definitive Guide).

Hive Architecture:

enter image description here

MetaStore configuration:

enter image description here

Hive Architecture which shows what "Driver" is:

enter image description here

I am not able to understand the following:

1) What is Hive Services in Hive architecture diagram? Is it same when we say hiveserver2?

2) What is Driver in Hive architecture diagram?

3) What is MetaStore (I am NOT referring to Metastore Database). Is it some process which runs? If so, is this part of hiveserver2 ? As per the diagram MetaStore can be remote, so if this is a JVM process, to which component it belongs to?

4) It say Hive service JVM , MetaStore JVM Server. But, where do these components gets installed? Are they part of the "server" side of "hive"?

5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".

Can anyone help understand this?

like image 769
CuriousMind Avatar asked Apr 12 '18 14:04

CuriousMind


People also ask

How do I know if HiveServer2 is running?

Check if the HiveServer2 service is running and listening on port 10000 using netstat command.

What is the use of HiveServer2?

Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data.

What is the difference between HiveServer2 and Beeline?

Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.


1 Answers

Hive Services

  • HiveServer2
  • Hive Metastore
  • HCatalog + WebHcat
  • Beeline & Hive CLI
  • Thrift client
  • FileSystem :: HDFS and other compatible filesystems like S3
  • Execution engine :: MapReduce, Tez, Spark
  • Hive Web UI (added in Hive 2.x). Maybe also Tez or Spark UI, but not really

Driver

The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver

Metastore Server

Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).

Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby

See Hive Wiki

Metastore JVM

The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.

I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?

It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.

HiveServer1 is deprecated. Feel free to read about it, but don't use it.

like image 185
OneCricketeer Avatar answered Sep 23 '22 04:09

OneCricketeer