Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop/Hive : Loading data from .csv on a local machine

As this is coming from a newbie...

I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server?

How do I load .csv data into Hive then? What does it have to do with Hadoop and which mode I should run that one?

What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier?

like image 567
mel Avatar asked Oct 11 '13 14:10

mel


People also ask

How do I load local Hive data?

LOAD DATA [LOCAL] INPATH '<The table data location>' [OVERWRITE] INTO TABLE <table_name>; Note: The LOCAL Switch specifies that the data we are loading is available in our Local File System. If the LOCAL switch is not used, the hive will consider the location as an HDFS path location.

How do I import data into Hadoop Hive?

Navigate to the file you want to import, right-click it, select Import to Apache Hive, and select how to import it: Import as CSV, Import as Apache Avro, or Import as Apache Parquet. Provide import details. For Import as CSV, provide values on each tab of the Create a new job wizard and then click Create.


2 Answers

Let me work you through the following simple steps:

Steps:

First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.

hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ','; 

Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.

hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff; 

Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded

hive> SELECT * FROM Staff; 

Thanks.

like image 72
Adewole Kayode Avatar answered Oct 09 '22 09:10

Adewole Kayode


if you have a hive setup you can put the local dataset directly using Hive load command in hdfs/s3.

You will need to use "Local" keyword when writing your load command.

Syntax for hiveload command

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] 

Refer below link for more detailed information. https://cwiki.apache.org/confluence/display/Hive/LanguageManual%20DML#LanguageManualDML-Loadingfilesintotables

like image 36
hjamali52 Avatar answered Oct 09 '22 10:10

hjamali52