Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can i point multiple location to same hive external table?

I need to process the multiple months data simultaneously. So, is there an option to point multiple folders to external table? e.g. Create external table logdata(col1 string, col2 string........) location s3://logdata/april, s3://logdata/march

like image 775
Naresh Avatar asked Jun 03 '13 13:06

Naresh


People also ask

Can two external tables be stored at same location in HDFS?

Yes, you could point multiple tables to the same location on HDFS.

How do I change the location of the external table in Hive?

1) CREATE EXTERNAL TABLE IF NOT EXISTS jsont1( json string ) LOCATION '/jsam'; Now I need to change the location from where above json1 points to. It is ok to omit the nameservice/namenode part (to use the defaultFs). To do it correctly, you need to keep the `hdfs://` part and relative path that starts with a `/`.

What is location in Hive external table?

By default, it is /user/hive/warehouse directory. For instance, a table named students will be located at /user/hive/warehouse/students.

Can a Hive table contain data in more than one format?

Hive expects all the files for one table to use the same delimiter, same compression applied etc. So, you cannot use a Hive table on top of files with multiple formats.


1 Answers

Simple answer: no, the location of a Hive external table during creation has to be unique, this is needed by the metastore to understand where your table lives.

That being said, you can probably get away with using partitions: you can specify a location for each of your partitions which seems to be what you want ultimately since you are splitting by month.

So create your table like this:

create external table logdata(col1 string, col2 string) partitioned by (month string) location 's3://logdata'

Then you can add partitions like this:

alter table logdata add partition(month='april') location 's3://logdata/april'

You do this for every month, and now you can query your table specifying whichever partition you want, and Hive will only look at the directories for which you actually want data (for example if you're only processing april and june, Hive will not load may)

like image 157
Charles Menguy Avatar answered Oct 21 '22 22:10

Charles Menguy