Hive Managed Table vs External Table : LOCATION directory

Tags:

hive

I have been going through some HIVE books and tutorials. One of the book - Hadoop in Practice says

When you create an external (unmanaged) table, Hive keeps the data in the directory specified by the LOCATION keyword intact. But if you were to execute the same CREATE command and drop the EXTERNAL keyword, the table would be a managed table, and Hive would move the contents of the LOCATION directory into /user/hive/ warehouse/stocks, which may not be the behavior you expect.

I created a managed table with LOCATION keyword. And then loaded data into the table from a HDFS file. But I could not see any directory created under /user/hive/warehouse. Rather the new directory was created in LOCATION mentioned. So I think if I create a MANAGED table with LOCATION mentioned then there is nothing created in Hive warehouse directory ? Is this understanding correct ?

Also if the location of the input file during LOAD command is hdfs, then internal or external table both will move the data to their location. Is this understanding also correct ?

861

asked Jul 09 '15 07:07

user1060418

2 Answers

In both cases(managed or external) Location is optional so whenever you specify LOCATION data will be stored on the same HDFC LOCATION PATH irrespective of which table you are creating(managed or external). And, if you don't use LOCATION, default location path which is mentioned in hive-site.xml is considered.

189

answered Nov 15 '22 07:11

Ravindra Phule

First of all when you create a managed table with location keyword, it does not create a directory at the specified location, rather it will give you an Exception:FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://path/of/the/given/location is not a directory or unable to create one).
This means that in the DDL, the location that you have given, first needs the directory to be present, else the above given Exception will be thrown.
Next you can create the DDL with location given.
Next you can use a select * from <table> command to view the data (without having to load data).
But when you drop this table, your data is also gone from hdfs (unlike External tables) and also gone is the metadata.
This is the primary difference between a managed table with location keyword. It behaves partly like external table, and partly like managed table.
External, as in, you dont have to load the data, and you just specify the location.
Managed, as in, you drop the table, the data is also deleted.
Hope that makes sense.

answered Nov 15 '22 07:11

aiman

Related questions
                            
                                $HIVE_HOME/bin/hive --service hiveserver
                            
                                Hadoop component is not starting
                            
                                how does hadoop read input file?
                            
                                Stream decoding of Base64 data
                            
                                Hadoop on Local FileSystem
                            
                                Jar file for MapReduce new API Job.getInstance(Configuration, String)
                            
                                is is possible to count the number of partitions?
                            
                                How do I search for an item in an array in Hive?
                            
                                Apache Spark with custom InputFormat for HadoopRDD
                            
                                Spring support for WebHDFS
                            
                                Accessing read-only Google Storage buckets from Hadoop
                            
                                How build hadoop sources under windows?
                            
                                How to configure Hive warehouse path?
                            
                                NoSuchMethodError Sets.newConcurrentHashSet() while running jar using hadoop
                            
                                What is the "t" permission on HDFS directories?
                            
                                Difference between combiner and in-mapper combiner in mapreduce?
                            
                                concatenate a string to a field in pig
                            
                                How to Append new data to already existing hive table
                            
                                How can we pass List<Text> as Mapper output?
                            
                                How to use rbhive gem and query hive

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With