I need to process the multiple months data simultaneously. So, is there an option to point multiple folders to external table?
e.g.
Create external table logdata(col1 string, col2 string........) location s3://logdata/april, s3://logdata/march
Yes, you could point multiple tables to the same location on HDFS.
1) CREATE EXTERNAL TABLE IF NOT EXISTS jsont1( json string ) LOCATION '/jsam'; Now I need to change the location from where above json1 points to. It is ok to omit the nameservice/namenode part (to use the defaultFs). To do it correctly, you need to keep the `hdfs://` part and relative path that starts with a `/`.
By default, it is /user/hive/warehouse directory. For instance, a table named students will be located at /user/hive/warehouse/students.
Hive expects all the files for one table to use the same delimiter, same compression applied etc. So, you cannot use a Hive table on top of files with multiple formats.
Simple answer: no, the location
of a Hive external
table during creation has to be unique, this is needed by the metastore to understand where your table lives.
That being said, you can probably get away with using partitions: you can specify a location
for each of your partitions which seems to be what you want ultimately since you are splitting by month.
So create your table like this:
create external table logdata(col1 string, col2 string) partitioned by (month string) location 's3://logdata'
Then you can add partitions like this:
alter table logdata add partition(month='april') location 's3://logdata/april'
You do this for every month, and now you can query your table specifying whichever partition you want, and Hive will only look at the directories for which you actually want data (for example if you're only processing april and june, Hive will not load may)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With