Insert Zipped File into RedShift

Tags:

amazon-redshift

I have a file up in s3 that is zipped. I would like to insert it into a RedShift database. The only way my research has found to do this is by launching an ec2 instance. Moving the file there, unzipping it, and sending it back to S3. Then to insert it into my RedShift table. But I am trying to do this all from JavaSDK from an outside machine and do not want to have to use an Ec2 instance. Is there a way to just have an EMR job unzip the file? Or insert the zipped file directly into RedShift?

Files are .zip not .gzip

636

asked Jul 19 '13 13:07

Dan Ciborowski - MSFT

2 Answers

You cannot directly insert a zipped file into Redshift as per Guy's comment.

Assuming this is not a 1 time task, I would suggest using AWS Data Pipeline to perform this work. See this example of copy data between S3 buckets. Modify the example to unzip and then gzip your data instead of simply copying it.

Use the ShellCommandActivity to execute a shell script that performs the work. I would assume this script could invoke Java if you choose and appropriate AMI as your EC2 resource (YMMV).

Data Pipeline is highly efficient for this type of work because it will start and terminate the EC2 resource automatically plus you do not have to worry about discovering the name of the new instance in your scripts.

answered Jan 02 '23 22:01

Joe Harris

add gzip option, please refer: http://docs.aws.amazon.com/redshift/latest/dg/c_loading-encrypted-files.html we can use Java client to execute SQL

answered Jan 03 '23 00:01

coderz

Related questions
                            
                                AWS RedShift - .NET Core (ODBC Support?)
                            
                                Can AWS Redshift drop a table that is wrapped in transaction?
                            
                                Redshift - Adding a column, do we have to change our previous CSVs to include it?
                            
                                Using RedShift as an additional Django Database
                            
                                How to do dynamic regex matching, in redshift?
                            
                                Using CTE and Update in Redshift
                            
                                Unable to create cluster. VPC is greyed out when launching Redshift Cluster
                            
                                MongoDB into AWS Redshift
                            
                                Synchronize data from MySql to Amazon RedShift
                            
                                Version control for Tableau
                            
                                What does the column skew_sorkey1 in Amazon Redshift's svv_table_info imply?
                            
                                If you change a user's redshift password, will any pre-existing connections for that user remain valid?
                            
                                Where can I find usage statistics in Redshift?
                            
                                upload pandas dataframe to redshift - relation "sqlite_master" does not exist
                            
                                S3 to Redshift input data format
                            
                                Epoch to timeformat 'YYYY-MM-DD HH:MI:SS' while redshift copy
                            
                                Copy to Redshift from another accounts S3 bucket
                            
                                Unload data from postgres to s3
                            
                                Load Parquet files into Redshift
                            
                                AZ64 compression format performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With