Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load data from splitted gzip files into redshift?

Can I load data from splitted gzip files into an amazon redshift table?

I can load data from single gzip file or from splitted text files.

But can I load data from splitted gzip files?

like image 439
Luniam Avatar asked Sep 15 '25 15:09

Luniam


1 Answers

I'm assuming here that you mean that you have multiple CSV files that are each gzipped.

First, upload each file to an S3 bucket under the same prefix and delimiter.

s3://S3_BUCKET/S3_PREFIX/file0.gz s3://S3_BUCKET/S3_PREFIX/file1.gz

Then, execute the Redshift copy command:

copy TABLE_NAME from 's3://S3_BUCKET/S3_PREFIX' credentials 'aws_access_key_id=ACCESS_KEY;aws_secret_access_key=SECRET_KEY' csv gzip

  • specify the S3 bucket and prefix
  • include credentials that have permission to read the S3 objects
  • ensure that the destination table already exists and has columns compatible with the CSV
  • specify the csv and gzip options

Also, it's a good idea to have your file count proportional to the number of nodes in your cluster.

like image 182
Aaron Perrin Avatar answered Sep 17 '25 04:09

Aaron Perrin