Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where does hive stores its table?

Tags:

hive

I am new to Hadoop and I just started working on Hive, I my understanding it provides a query language to process data in HDFS. With HiveQl we can create tables and load data into it from HDFS.

So my question is: where are those tables stored? Specifically if we have 100 GB file in our HDFS and we want to make a hive table out of that data what will be the size of that table and where is it stored?

If my understanding about this concept is wrong please correct me ..

like image 280
talin Avatar asked Mar 26 '15 11:03

talin


People also ask

Where are hive tables stored in HDFS?

An internal table is stored on HDFS in the /user/hive/warehouse directory which is its default storage location. This location can be changed by updating the path in the configuration file present in the config file – hive. metastore.

Where hive external tables store data?

External tables are stored outside the warehouse directory. They can access data stored in sources such as remote HDFS locations or Azure Storage Volumes. Whenever we drop the external table, then only the metadata associated with the table will get deleted, the table data remains untouched by Hive.


2 Answers

If the table is 100GB you should consider an Hive External Table (as opposed to a "managed table", for the difference, see this).

With an external table the data itself will be still stored on the HDFS in the file path that you specify (note that you may specify a directory of files as long as they all have the same structure), but Hive will create a map of it in the meta-store whereas the managed table will store the data "in Hive".

When you drop a managed table, it drops the underlying data as opposed to dropping a hive external table which only drops the meta-data from the meta-store referencing that data.

Either way you are using only 100GB as viewed by the user and are taking advantage of the HDFS' robustness though duplication of the data.

like image 105
mlegge Avatar answered Oct 21 '22 22:10

mlegge


Hive will create a directory on HDFS. If you didn't specify any location it will create a directory at /user/hive/warehouse on HDFS. After load command the files are moved to the /warehouse/tablename. You can also point to the HDFS directory if it contains partitions (if the files are partitioned), or use external table concept.

like image 26
Sravan K Reddy Avatar answered Oct 21 '22 22:10

Sravan K Reddy