Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

metastore_db created wherever I run Hive

Tags:

Folder metastore_db is created in any directory where I run Hive query. Is there any way to have only one metastore_db in a defined location and stop it from being created all over the places? Does it have anything to do with hive.metastore.local?

like image 266
darcyy Avatar asked Nov 29 '12 11:11

darcyy


People also ask

What is Metastore_db?

=> Metastore (aka metastore_db) is a relational database that is used by Hive, Presto, Spark, etc. to manage the metadata of persistent relational entities (e.g. databases, tables, columns, partitions) for fast access. Additionally, a spark-warehouse is the directory where Spark SQL persists tables.

Does Hive store metadata?

It stores metadata for Hive tables (like their schema and location) and partitions in a relational database. It provides client access to this information by using metastore service API. Hive metastore consists of two fundamental units: A service that provides metastore access to other Apache Hive services.

Where is Metastore_db?

Note that the location of the metastore ( metastore_db ) is a relative path. Therefore, it gets created where you launch Hive from. If you update this property (in your hive-site. xml) to be, say an absolute path to a location, the metastore will be used from that location.


1 Answers

The property of interest here is javax.jdo.option.ConnectionURL. The default value of this property is jdbc:derby:;databaseName=metastore_db;create=true. This value specifies that you will be using embedded derby as your Hive metastore and the location of the metastore is metastore_db. Also the metastore will be created if it doesn't already exist.

Note that the location of the metastore (metastore_db) is a relative path. Therefore, it gets created where you launch Hive from. If you update this property (in your hive-site.xml) to be, say an absolute path to a location, the metastore will be used from that location.

I must warn you though that embedded derby metastore can only be accessed by one user at a time. Hive uses embedded derby by default to allow an out of the box experience and for ease of testing. For any practical system, I would recommend moving to standalone "real" database like MySQL or PostgreSQL. Instructions on how to do that are available here.

like image 139
Mark Grover Avatar answered Sep 28 '22 18:09

Mark Grover