I have a log file in HDFS, values are delimited by comma. For example:
2012-10-11 12:00,opened_browser,userid111,deviceid222
Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');"
require definition of partitions in the script, but I want partitions to set up automatically from HDFS file.
The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid;
but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.
Q 16 - What is the disadvantage of using too many partitions in Hive tables? A - The extra columns are replaced with NULL B - The row is skipped C - The import halts with error D - The Columns are ignored. Q 18 - Consider the below two sets of queries.
You can use this command to create that: hive> INSERT INTO TABLE Y PARTITION(state) SELECT * from X; Here you should ensure that the partition column is the last column of the non-partitioned table.
The general syntax for showing partitions is as follows: SHOW PARTITIONS [db_name.] table_name [PARTITION(partition_spec)];
Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.
The quick context is that:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With