Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between `load data inpath ` and `location` in hive?

At my firm, I see these two commands used frequently, and I'd like to be aware of the differences, because their functionality seems the same to me:

1

create table <mytable> 
(name string,
number double);

load data inpath '/directory-path/file.csv' into <mytable>; 

2

create table <mytable>
(name string,
number double);

location '/directory-path/file.csv';

They both copy the data from the directory on HDFS into the directory for the table on HIVE. Are there differences that one should be aware of when using these? Thank you.

like image 477
makansij Avatar asked Feb 18 '16 05:02

makansij


1 Answers

Yes, they are used for different purposes at all.

load data inpath command is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.

load data inpath '/directory-path/file.csv' into <mytable>; 
load data local inpath '/local-directory-path/file.csv' into <mytable>;

LOCATION keyword allows to point to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

In other words, with specified LOCATION '/your-path/', Hive does not use a default location for this table. This comes in handy if you already have data generated.

Remember, LOCATION can be specified on EXTERNAL tables only. For regular tables, the default location will be used.

To summarize, load data inpath tell hive where to look for input files and LOCATION keyword tells hive where to save output files on HDFS.

References: https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL

like image 75
Sachin Gaikwad Avatar answered Sep 21 '22 00:09

Sachin Gaikwad