Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Insert Zipped File into RedShift

I have a file up in s3 that is zipped. I would like to insert it into a RedShift database. The only way my research has found to do this is by launching an ec2 instance. Moving the file there, unzipping it, and sending it back to S3. Then to insert it into my RedShift table. But I am trying to do this all from JavaSDK from an outside machine and do not want to have to use an Ec2 instance. Is there a way to just have an EMR job unzip the file? Or insert the zipped file directly into RedShift?

Files are .zip not .gzip

like image 636
Dan Ciborowski - MSFT Avatar asked Jul 19 '13 13:07

Dan Ciborowski - MSFT


People also ask

How do you insert into a Redshift table?

The simplest way to insert a row in Redshift is to to use the INSERT INTO command and specify values for all columns. If you have 10 columns, you have to specify 10 values and they have to be in order how the table was defined:.

How do I load a zip file in R?

To read a zip file and extract data from it to R environment, we can use the download. file() to download the zip, then unzip() allows to unzip the same and extract files using read. csv().

What are the recommended file compression for Redshift?

For optimum parallelism, the ideal file size is 1–125 MB after compression.


2 Answers

You cannot directly insert a zipped file into Redshift as per Guy's comment.

Assuming this is not a 1 time task, I would suggest using AWS Data Pipeline to perform this work. See this example of copy data between S3 buckets. Modify the example to unzip and then gzip your data instead of simply copying it.

Use the ShellCommandActivity to execute a shell script that performs the work. I would assume this script could invoke Java if you choose and appropriate AMI as your EC2 resource (YMMV).

Data Pipeline is highly efficient for this type of work because it will start and terminate the EC2 resource automatically plus you do not have to worry about discovering the name of the new instance in your scripts.

like image 81
Joe Harris Avatar answered Jan 02 '23 22:01

Joe Harris


add gzip option, please refer: http://docs.aws.amazon.com/redshift/latest/dg/c_loading-encrypted-files.html we can use Java client to execute SQL

like image 29
coderz Avatar answered Jan 03 '23 00:01

coderz