Sure hope someone can help me out with creating external HIVE partitioned tables by automatically adding data based on comma delimited files residing in an HDFS directory. My understanding, or lack thereof, is that when you define a CREATE EXTERNAL TABLE, PARTITIONED, and providing it with a LOCATION, it should recursively scan/read each and every sub-directory, and load data into the newly create partitioned external table. The following should provide some additional insight into my troubles…
Sample HDFS directory structure:<br>
/data/output/dt=2014-01-01<br>
/data/output/dt=2014-01-02<br>
/data/output/dt=2014-01-03 ...<br>
/data/output/dt=2014-05-21<br>
And each 'dt=' sub-directory contains the delimited file.
The following is an example of my CREATE EXTERNAL TABLE syntax:
CREATE EXTERNAL TABLE master_test (UID string,
lname string,
fname string,
addr string,
city string,
state string,
orderdate string,
shipdate string)
PARTITIONED BY (dt STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/data/output/';
Upon the creation of my master_test external table, I would have thought that all of my delimited files would have already been contained within the table upon the CREATE. The only way for me to get data into the newly defined external table is through an ALTER TABLE ADD PARTITION statement, for example: ALTER TABLE master_test ADD PARTITION (dt='2014-04-16'). Or, if I explicitly define the location of the delimited file, it will add the individual file the defined table.
Any suggestions or guidance is greatly appreciated.
You can use MSCK REPAIR to automatically discover the partitions. Take a look at the doc: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With