I am trying to understand hive
in terms of architecture, and I am referring to Tom White's book on Hadoop.
I came across the following terms in regards to hive: Hive Services
, hiveserver2
, metastore
among others.
Referring to below diagrams from the Book (Hadoop: The definitive Guide).
I am not able to understand the following:
1) What is Hive Services
in Hive architecture diagram? Is it same when we say hiveserver2
?
2) What is Driver
in Hive architecture diagram?
3) What is MetaStore
(I am NOT referring to Metastore Database). Is it some process which runs? If so, is this part of hiveserver2
? As per the diagram MetaStore
can be remote, so if this is a JVM process, to which component it belongs to?
4) It say Hive service JVM
, MetaStore JVM Server
. But, where do these components gets installed? Are they part of the "server" side of "hive"?
5) In "Hive Architecture" diagram, it say "Hive Server"? What is this? Is this the one which we say "Hive Server 1" , "Hive Server2".
Can anyone help understand this?
Check if the HiveServer2 service is running and listening on port 10000 using netstat command.
Hive allows users to read, write, and manage petabytes of data using SQL. Hive is built on top of Apache Hadoop, which is an open-source framework used to efficiently store and process large datasets. As a result, Hive is closely integrated with Hadoop, and is designed to work quickly on petabytes of data.
Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.
The JDBC/ODBC or Thrift interfaces have drivers.
There are also the processes that interpret the query and compile it down to the execution engine code. I personally call that an interpreter or compiler, not a driver
Not part of HiveServer2. It is literally a process running on top of an RDBMS (yes, you still need these when running Hive & Hadoop).
Supported Remote Metastore servers = Oracle, MySQL, Postgres
Embedded Metastore (not recommended for production) = Derby
See Hive Wiki
Metastore JVM
The orange boxes are showing you can deploy these services as part of the same JVM as the driver (interpreter) or as a remote server. The wiki describes these setups.
I believe this is a side-car process that maps the HiveServer2 queries to the MetaStore queries. For example, how do you translate the HiveQL into a process that reads metadata from MySQL or Postgres?
It can run on the server-side, yes, but this is not a recommended setup for fault tolerance and performance reasons.
HiveServer1 is deprecated. Feel free to read about it, but don't use it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With