When i run my copy command to copy all the files from an S3 folder to a Redshift table it fails with "ERROR: gzip: unexpected end of stream. Unknown zlib error code. zlib error code: -1"
:
copy table_name
(column_list)
from 's3://bucket_name/folder_name/'
credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
delimiter '|' GZIP
However when I specify a file prefix for each of the files within the folder it succeeds:
copy table_name
(column_list)
from 's3://bucket_name/folder_name/file_prefix'
credentials 'aws_access_key_id=xxxxxx;aws_secret_access_key=xxxxxxxxx'
delimiter '|' GZIP
The files are GZIP-ed.
It is not explicitly specified in the AWS doc that if you just specify the folder_name it will be ok for the copy command to load the whole contents of that folder, however I do get an error.
Does anyone encountered any similar issues? Is a file-prefix required for this kind of operations?
One of your gzipped files is not properly formed. GZip includes the compression "dictionary" at the end of the file and it can't be expanded without it.
If the file does not get fully written, e.g., you run out of disk space, then you get the error you're seeing when you attempt to load it into Redshift.
Speaking from experience… ;-)
I encountered the same issue and in my case gzip files were correct as when using the copy command with exact file name, it was working fine.
The issue was mainly because of application "S3 Browser". When you create directories with it, it create some extra hidden files in it. And when the copy command try to read files in the directory, it reads those hidden invalid gzip file and throws the error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With