How to handle embed line breaks in AWS Athena

Question

I have created a table in AWS Athena like this:

CREATE EXTERNAL TABLE IF NOT EXISTS default.test_line_breaks (
  col1 string, 
  col2 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
 'separatorChar' = ',',
 'quoteChar' = '\"',
 'escapeChar' = '\'
)
STORED AS TEXTFILE
LOCATION 's3://bucket/test/'

In the bucket I put a simple CSV file with the following context:

rec1 col1,rec2 col2
rec2 col1,"rec2, col2"
rec3 col1,"rec3
col2"

When I run data preview request SELECT * FROM "default"."test_line_breaks" limit 10; then Athena returns the following response: Athena query response

How should I set ROW FORMAT to properly handle line breaks within the field values? So that rec3 col2 appears in col2.

Zerodf · Accepted Answer

The problem here is that the OpenCSV Serializer-Deserializer

Does not support embedded line breaks in CSV files.

See this documentation from AWS.

However, it might be possible to use RegexSerDe. Just remember that this Deserializer will take "Java Flavored" Regex. So be sure to use an online Regex tool that supports that syntax in your debugging.

Edit: Still working on the syntax for dealing with the embedded line feed . However, here is a sample that handles two columns with optional quotes. The following regex "*([^"]*)"*,"*([^"]*)"* worked on your line with the embedded return carriage. However, I think the Presto Engine is only feeding it rec3 col1,"rec3. I continue working on it.

CREATE EXTERNAL TABLE IF NOT EXISTS default.test_line_breaks (
  col1 string, 
  col2 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
)
STORED AS TEXTFILE
LOCATION 's3://.../47936191';

How to handle embed line breaks in AWS Athena

Tags:

amazon-web-services

amazon-athena

antklim

1 Answers

Zerodf

Recent Activity

Donate For Us

How to handle embed line breaks in AWS Athena

Tags:

amazon-web-services

amazon-athena

antklim

1 Answers

Zerodf

Related questions

Recent Activity

Donate For Us