Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between the hive jdbc client and the hive metastore java api?

I was using hive jdbc but after that I came to know that there is hive metastore java api (here) by which you can again connect to hive and manipulate hive database.

But I was wondering that what exactly is the difference between these two ways.

Sorry if asked anything obvious but any information will be highly appreciated.

like image 301
sachingupta Avatar asked Sep 26 '14 08:09

sachingupta


People also ask

What is Hive and Hive Metastore?

What is Hive Metastore? Metastore is the central repository of Apache Hive metadata. It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API.

What is a Hive client?

The Hive ODBC client provides a set of C-compatible library functions to interact with Hive Server in a pattern similar to those dictated by the ODBC specification. See Hive ODBC Driver.

What is Hive JDBC?

Hive JDBC Connector 2.6. The Cloudera JDBC Driver for Hive enables your enterprise users to access Hadoop data through Business Intelligence (BI) applications with JDBC support. The driver achieves this by translating calls from the application into SQL and passing the SQL queries to the underlying Hive engine.

Does Hive use JDBC?

Hive provides a JDBC connection URL string jdbc:hive2://ip-address:port to connect to Hive warehouse from remote applications running with Java , Scala , Python , Spark and many more.


2 Answers

as far as I understand there are 2 ways to connect to Hive

  1. using hive metastore server, which then connects in the background to a relational db such as mysql for schema manifestation. This runs on port 9083, generally.
  2. hive jdbc server, called HiveServer2, which runs on port 10001, generally...

Now, in the earlier editions of hive, hiveserver2 used to be not so stable and in fact it's multi-threading support was also limited. Things have probably improved in that arena, I'd imagine.

So for JDBC api - yes, it would let you communicate using JDBC and sql.

For the metastore connectivity, there appear to be 2 features.

  1. to actually run SQL queries - DML
  2. to perform DDL operations.

DDL -

for DDL, the metastore APIs come in handy, org.apache.hadoop.hive.metastore.HiveMetaStoreClient HiveMetaStoreClient class can be utilized for that purpose

DML -

what I have found useful in this regard is the org.apache.hadoop.hive.ql.Driver https://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/Driver.html hive.ql.Driver class This class has a method called run() which lets you execute a SQL statement and get the result back. for e.g. you can do following

Driver driver = new Driver(hiveConf);
HiveMetaStoreClient client = new HiveMetaStoreClient(hiveConf);
SessionState.start(new CliSessionState(hiveConf));
driver.run("select  * from employee);
// DDL example
client.dropTable(db, table);
like image 195
ameet chaubal Avatar answered Oct 22 '22 20:10

ameet chaubal


metastore in hive as the name indicates is a store for hive db's metadata. This store is usually an RDBMS. The metastore api supports interacting with the RDBMS to tinker/tweak the metadata and not the actual hive db/data.For normal usage you may never want/have to use these.I would think that these are meant for people working on creating toolsets to work with the metastore and not for normal day to day usage.

like image 44
Chetya Avatar answered Oct 22 '22 20:10

Chetya