I have a log file in HDFS, values are delimited by comma. For example: <code>2012-10-11 12:00,opened_browser,userid111,deviceid222</code> Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples <code>e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');"</code> require definition of partitions in the script, but I want partitions to set up automatically from HDFS file. The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an <code>INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid;</code> but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.

Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables. The quick context is that: <ol> <li>Load data simply copies data, it doesn't read it so it cannot figure out what to partition</li> <li>Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table</li> </ol>

Hive loading in partitioned table

Tags:

hive

loading

I have a log file in HDFS, values are delimited by comma. For example:

2012-10-11 12:00,opened_browser,userid111,deviceid222

Now I want to load this file to Hive table which has columns "timestamp","action" and partitioned by "userid","deviceid". How can I ask Hive to take that last 2 columns in log file as partition for table? All examples e.g. "hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');" require definition of partitions in the script, but I want partitions to set up automatically from HDFS file.

The one solution is to create intermediate non-partitioned table with all that 4 columns, populate it from file and then make an INSERT into first_table PARTITION (userid,deviceid) select from intermediate_table timestamp,action,userid,deviceid; but that is and additional task and we will have 2 very similiar tables.. Or we should create external table as intermediate.

469

asked Oct 30 '12 21:10

Valery Yesypenko

1 Answers

Ning Zhang has a great response on the topic at http://grokbase.com/t/hive/user/114frbfg0y/can-i-use-hive-dynamic-partition-while-loading-data-into-tables.

The quick context is that:

Load data simply copies data, it doesn't read it so it cannot figure out what to partition
Would suggest that you load data into an intermediate table first (or using an external table pointing to all the files) and then letting partition dynamic insert to kick in to load it into a partitioned table

120

answered Sep 18 '22 18:09

Denny Lee

Related questions
                            
                                Chrome AJAX on page-load causes "busy cursor" to remain
                            
                                Make package in R not required to load when I startup R/RStudio?
                            
                                How and when should I load the model from database for h:dataTable
                            
                                How to define Handlebar.js templates in an external file
                            
                                loading animated png [closed]
                            
                                jqGrid trigger "Loading..." overlay
                            
                                Determinate finish loading website in webView with Swift in Xcode
                            
                                C# WinForm - loading screen
                            
                                Blackberry - Loading/Wait screen with animation
                            
                                Listing the files in a directory of the current JAR file
                            
                                How to speed up website loading for opposite side of planet
                            
                                Django - show loading message during long processing
                            
                                Load 1000 images smartly
                            
                                Creating a loading screen in HTML5
                            
                                Show loading progress when making JSF Ajax request
                            
                                How to show progress of Axios during get request (not download or upload)
                            
                                Xamarin.Forms - how to absolutely center an element on the page?
                            
                                Why adding a <script> tag at runtime doesn't load the javascript file? (with react.js)
                            
                                Show a loading gif while iframe page content loads
                            
                                Images so slow they do not appear on page when document finishes loading

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With