Running a COPY command to load gzip-ed data to Redshift in S3

Question

When i run my copy command to copy all the files from an S3 folder to a Redshift table it fails with "ERROR: gzip: unexpected end of stream. Unknown zlib error code. zlib error code: -1":

copy table_name 
    (column_list)
from 's3://bucket_name/folder_name/'
     credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
     delimiter '|' GZIP

However when I specify a file prefix for each of the files within the folder it succeeds:

copy table_name 
    (column_list)
from 's3://bucket_name/folder_name/file_prefix'
     credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
     delimiter '|' GZIP

The files are GZIP-ed.

It is not explicitly specified in the AWS doc that if you just specify the folder_name it will be ok for the copy command to load the whole contents of that folder, however I do get an error.

Does anyone encountered any similar issues? Is a file-prefix required for this kind of operations?

Joe Harris · Accepted Answer

One of your gzipped files is not properly formed. GZip includes the compression "dictionary" at the end of the file and it can't be expanded without it.

If the file does not get fully written, e.g., you run out of disk space, then you get the error you're seeing when you attempt to load it into Redshift.

Speaking from experience… ;-)

user217869 · Answer

I encountered the same issue and in my case gzip files were correct as when using the copy command with exact file name, it was working fine.

The issue was mainly because of application "S3 Browser". When you create directories with it, it create some extra hidden files in it. And when the copy command try to read files in the directory, it reads those hidden invalid gzip file and throws the error.

Running a COPY command to load gzip-ed data to Redshift in S3

Tags:

amazon-web-services

amazon-s3

amazon-redshift

and_apo

2 Answers

Joe Harris

user217869

Recent Activity

Donate For Us

Running a COPY command to load gzip-ed data to Redshift in S3

Tags:

amazon-web-services

amazon-s3

amazon-redshift

and_apo

2 Answers

Joe Harris

user217869

Related questions

Recent Activity

Donate For Us