Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running a COPY command to load gzip-ed data to Redshift in S3

When i run my copy command to copy all the files from an S3 folder to a Redshift table it fails with "ERROR: gzip: unexpected end of stream. Unknown zlib error code. zlib error code: -1":

copy table_name 
    (column_list)
from 's3://bucket_name/folder_name/'
     credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
     delimiter '|' GZIP

However when I specify a file prefix for each of the files within the folder it succeeds:

copy table_name 
    (column_list)
from 's3://bucket_name/folder_name/file_prefix'
     credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
     delimiter '|' GZIP

The files are GZIP-ed.

It is not explicitly specified in the AWS doc that if you just specify the folder_name it will be ok for the copy command to load the whole contents of that folder, however I do get an error.

Does anyone encountered any similar issues? Is a file-prefix required for this kind of operations?

like image 394
and_apo Avatar asked Mar 18 '23 09:03

and_apo


2 Answers

One of your gzipped files is not properly formed. GZip includes the compression "dictionary" at the end of the file and it can't be expanded without it.

If the file does not get fully written, e.g., you run out of disk space, then you get the error you're seeing when you attempt to load it into Redshift.

Speaking from experience… ;-)

like image 136
Joe Harris Avatar answered Apr 01 '23 04:04

Joe Harris


I encountered the same issue and in my case gzip files were correct as when using the copy command with exact file name, it was working fine.

The issue was mainly because of application "S3 Browser". When you create directories with it, it create some extra hidden files in it. And when the copy command try to read files in the directory, it reads those hidden invalid gzip file and throws the error.

like image 24
user217869 Avatar answered Apr 01 '23 03:04

user217869