Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Skipping header rows in AWS Redshift External Tables

I have a file in S3 with the following data:

name,age,gender
jill,30,f
jack,32,m

And a redshift external table to query that data using spectrum:

create external table spectrum.customers ( 
 "name" varchar(50),
 "age" int,
 "gender" varchar(1))
row format delimited
fields terminated by ','
lines terminated by \n'
stored as textfile 
location 's3://...';

When querying the data I get the following result:

select * from spectrum.customers;
name,age,g
jill,30,f
jack,32,m

Is there an elegant way to skip the header row as part of the external table definition, similar to the tblproperties ("skip.header.line.count"="1") option in Hive? Or is my only option (at least for now) to filter out the header rows as part of the select statement?

like image 412
fez Avatar asked Dec 03 '22 12:12

fez


2 Answers

Answered this in: How to skip headers when we are reading data from a csv file in s3 and creating a table in aws athena.

This works in Redshift:

You want to use table properties ('skip.header.line.count'='1') Along with other properties if you want, e.g. 'numRows'='100'. Here's a sample:

create external table exreddb1.test_table
(ID BIGINT 
,NAME VARCHAR
)
row format delimited
fields terminated by ','
stored as textfile
location 's3://mybucket/myfolder/'
table properties ('numRows'='100', 'skip.header.line.count'='1');
like image 170
TheWalkingData Avatar answered Dec 20 '22 19:12

TheWalkingData


Currently, AWS Redshift Spectrum does not support skipping header rows. If you can, you could raise a support issue that would allow tracking the availability of this feature.

It would be possible to forward this request to the development team for consideration.

like image 36
grundprinzip Avatar answered Dec 20 '22 20:12

grundprinzip