How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive

Tags:

I am trying to use EMR/Hive to import data from S3 into DynamoDB. My CSV file has fields which are enclosed within double quotes and separated by comma. While creating external table in hive, I am able to specify delimiter as comma but how do I specify that fields are enclosed within quotes?

If I don’t specify, I see that values in DynamoDB are populated within two double quotes ““value”” which seems to be wrong.

I am using following command to create external table. Is there a way to specify that fields are enclosed within double quotes?

CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string)  ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' LOCATION 's3://emrTest/folder';

Any suggestions would be appreciated. Thanks Jitendra

251

asked Dec 27 '12 21:12

RandomQuestion

2 Answers

I was also stuck with the same issue as my fields are enclosed with double quotes and separated by semicolon(;). My table name is employee1.

So I have searched with links and I have found perfect solution for this.

We have to use serde for this. Please download serde jar using this link : https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar

then follow below steps using hive prompt :

add jar path/to/csv-serde.jar;

create table employee1(id string, name string, addr string)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
;

and then load data from your given path using below query:

load data local inpath 'path/xyz.csv' into table employee1;

and then run :

select * from employee1;

Now you will see the magic. Thanks.

answered Sep 16 '22 15:09

Cast_A_Way

Following code solved same type of problem

CREATE TABLE TableRowCSV2(    
    CODE STRING,        
    PRODUCTCODE STRING, 
    PRICE STRING     
)
    COMMENT 'row data csv'    
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

WITH SERDEPROPERTIES (
   "separatorChar" = "\,",
   "quoteChar"     = "\""
)
STORED AS TEXTFILE
tblproperties("skip.header.line.count"="1");

answered Sep 18 '22 15:09

Shankar

Related questions
                            
                                Anyone using Node.js with Amazon SNS and Apple Push Notifications?
                            
                                AmazonS3 connection management
                            
                                psycopg2 on elastic beanstalk - can't deploy app
                            
                                AWS SAM Deploy, how to find URL of API Gateway?
                            
                                How to get arn of EC2 instance in AWS
                            
                                AWS Security group include another Security Group
                            
                                How to pass API Gateway authorizer context to a HTTP integration
                            
                                How to find if my Amazon EC2 instance is 32 bit or 64 bit?
                            
                                Get public dns name of a ec2 instance using ec2 command line tools in bash
                            
                                Can't connect to S3 buckets with periods in their name, when using Boto on Heroku
                            
                                Accessing Terraform variables within user_data provider template file
                            
                                how to get list of registered targets in AWS target group via CLI
                            
                                How do you keep mongo running on a remote server?
                            
                                Can I delete data (rows in tables) from Athena?
                            
                                How do I take a backup of aws ec2 instance/ephemeral storage?
                            
                                Invalidate all files in a folder in cloudfront console
                            
                                How to get the HTTP method in AWS Lambda?
                            
                                LATERAL VIEW EXPLODE in presto
                            
                                Unexpected output of 'arch' on OSX (using Mac M1 installing elastic beans)
                            
                                Athena: Query exhausted resources at scale factor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to handle fields enclosed within quotes(CSV) in importing data from S3 into DynamoDB using EMR/Hive

Tags:

amazon-web-services

amazon-s3

amazon-dynamodb

hive

amazon-emr

RandomQuestion

People also ask

2 Answers

Cast_A_Way

Shankar

Recent Activity

Donate For Us