Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove double quotes " while loading data to Amazon Redshift Spectrum

I want to load data to amazon redshift external table. Data is in CSV format and has quotes. Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables. Also what are different options to load fixed length data in external table.

like image 882
SauravT Avatar asked Mar 09 '23 18:03

SauravT


1 Answers

To create an external Spectrum table, you should reference the CREATE TABLE syntax provided by Athena. To load a CSV escaped by double quotes, you should use the following lines as your ROW FORMAT

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
    'separatorChar' = ',',
    'quoteChar' = '\"',
    'escapeChar' = '\\'
)

For fixed length files, you should use the RegexSerDe. In this case, the relevant portion of your CREATE TABLE statement will look like this (assuming 3 fields of length 100).

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "(.{100})(.{100})(.{100})")
like image 197
Andrew Jones Avatar answered Apr 08 '23 14:04

Andrew Jones