I have data organized in directories in a particular format (shown below) and want to add these to hive table. I want to add all data of 2012 directory. All below names are directory names, and the inner most dir (3rd level) has the actual data files. Is there any way to pick in the data directly without having to change this dir structure. Any pointers are appreciated.
/2012/
|
|---------2012-01
|---------2012-01-01
|---------2012-01-02
|...
|...
|---------2012-01-31
|
|---------2012-02
|---------2012-02-01
|---------2012-02-02
|...
|...
|---------2012-02-28
|
|---------2012-03
|...
|...
|---------2012-12
Queries tried so far without luck:
CREATE EXTERNAL TABLE sampledata
(datestr string, id string, locations string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
LOCATION '/path/to/data/2012/*/*';
CREATE EXTERNAL TABLE sampledata
(datestr string, id string, locations string)
partitioned by (ystr string, ymstr string, ymdstr string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|';
ALTER TABLE sampledata
ADD
PARTITION (ystr ='2012')
LOCATION '/path/to/data/2012/';
SOLUTION: This small parameter fixes my issue. Adding to the question where it might be beneficial for others:
SET mapred.input.dir.recursive=true;
Answering my own question with solution that works for my case. SET mapred.input.dir.recursive=true;
ALTER TABLE sampledata
ADD
PARTITION (ystr ='2012', ymstr='2012-01', ymdstr='2012-01-01')
LOCATION '/path/to/data/2012/2012-01/2012-01-01';
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With