Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading CSV file on Hive Table with String Array

Tags:

csv

hadoop

hive

I am trying to insert a CSV File into Hive with one field being array of string .

Here is the CSV File :

48,Snacks that Power Up Weight Loss,Aidan B. Prince,[Health&Fitness,Travel]
99,Snacks that Power Up Weight Loss,Aidan B. Prince,[Photo,Travel]

I tried creating table something like this :

CREATE TABLE IF NOT EXISTS Article
(
ARTICLE_ID INT,
ARTICLE_NSAME STRING,
ARTICLE_AUTHOR STRING,
ARTICLE_GENRE ARRAY<STRING>
);
LOAD DATA INPATH '/tmp/pinterest/article.csv' OVERWRITE INTO TABLE Article;
select * from Article;  

Here is output what I get :

article.article_id  article.article_name    article.article_author  article.article_genre
48  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Health&Fitness"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Photo"]

Its taking only one value in last field article_genre .

Can someone point out what wrong here ?

like image 923
Deepesh Shetty Avatar asked Nov 29 '15 16:11

Deepesh Shetty


People also ask

Can array be stored in Hive?

Array in Hive is an ordered sequence of similar type elements that are indexable using the zero-based integers. Arrays in Hive are similar to the arrays in JAVA.

What is the syntax to load the data file into Hive table?

Syntax: LOAD DATA [LOCAL] INPATH '<The table data location>' [OVERWRITE] INTO TABLE <table_name>; Note: The LOCAL Switch specifies that the data we are loading is available in our Local File System.


1 Answers

Couple of stuff :
You are missing definition for delimiter for collection items.
Also , I assume you expect you select * from article statement to return like below :

48  Snacks that Power Up Weight Loss    Aidan B. Prince ["Health&Fitness","Travel"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["Photo","Travel"]

I can give you an example and rest you can fiddle with it . Here is my table definition :

create table article (
  id int,
  name string,
  author string,
  genre array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '|';

And here is the data :

48,Snacks that Power Up Weight Loss,Aidan B. Prince,Health&Fitness|Travel
99,Snacks that Power Up Weight Loss,Aidan B. Prince,Photo|Travel

Now do a load like :
LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; and do select statement to check the result.

Most important point :
define delimiter for collection items and don't impose the array structure you do in normal programming.
Also, try to make the field delimiters different from collection items delimiters to avoid confusion and unexpected results.

like image 188
Chandra kant Avatar answered Oct 18 '22 02:10

Chandra kant