As this is coming from a newbie... I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive queries with .csv data stored on my computer, like I did with MS SQL Server? How do I load .csv data into Hive then? What does it have to do with Hadoop and which mode I should run that one? What settings I should care about so that if I did something wrong I can always go back and run queries on Amazon without compromising what was set up for me earlier?

Let me work you through the following simple steps: Steps: First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive. <pre class="prettyprint"><code>hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ','; </code></pre> Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive. <pre class="prettyprint"><code>hive> LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff; </code></pre> Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded <pre class="prettyprint"><code>hive> SELECT * FROM Staff; </code></pre> Thanks.

Hadoop/Hive : Loading data from .csv on a local machine

2 Answers

Let me work you through the following simple steps:

Steps:

First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.

hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';

Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.

hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;

Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded

hive> SELECT * FROM Staff;

Thanks.

answered Oct 09 '22 09:10

Adewole Kayode

if you have a hive setup you can put the local dataset directly using Hive load command in hdfs/s3.

You will need to use "Local" keyword when writing your load command.

Syntax for hiveload command

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

Refer below link for more detailed information. https://cwiki.apache.org/confluence/display/Hive/LanguageManual%20DML#LanguageManualDML-Loadingfilesintotables

answered Oct 09 '22 10:10

hjamali52

Related questions
                            
                                Mixing explicit and implicit joins fails with "There is an entry for table ... but it cannot be referenced from this part of the query"
                            
                                Selecting distinct combinations
                            
                                SQL Database Design Best Practice (Addresses)
                            
                                Why Stored Procedure is faster than Query
                            
                                How do you OR two LIKE statements?
                            
                                SQL Server 2005 and temporary table scope
                            
                                With Entity Framework is it better to use .First() or .Take(1) for "TOP 1"?
                            
                                SQL Server : How to test if a string has only digit characters
                            
                                How does a Recursive CTE run, line by line?
                            
                                Slow simple update query on PostgreSQL database with 3 million rows
                            
                                How can I declare a Boolean parameter in SQL statement?
                            
                                Whats the best SQLite data type for a long string
                            
                                How to export data from Excel spreadsheet to Sql Server 2008 table
                            
                                How does GROUP BY work?
                            
                                Postgres Case Sensitivity
                            
                                MySQL Group By and Sum total value of other column
                            
                                SQL: How To Select Earliest Row
                            
                                Get AVG ignoring Null or Zero values
                            
                                How to delete a range of records at once on MySQL?
                            
                                Get the list of stored procedures created and / or modified on a particular date?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Hadoop/Hive : Loading data from .csv on a local machine

Tags:

sql

csv

amazon-web-services

hadoop

hive

mel

People also ask

2 Answers

Adewole Kayode

hjamali52

Recent Activity

Donate For Us