Offloading data files from Amazon Redshift to Amazon S3 in Parquet format

Tags:

I would like to unload data files from Amazon Redshift to Amazon S3 in Apache Parquet format inorder to query the files on S3 using Redshift Spectrum. I have explored every where but I couldn't find anything about how to offload the files from Amazon Redshift to S3 using Parquet format. Is this feature not supported yet or was I not able to find any documentation about it. Could somebody who has worked on it share some light on this? Thank you.

872

asked Feb 07 '18 21:02

Teja

2 Answers

Redshift Unload to Parquet file format is supported as of Dec 2019:

UNLOAD ('select-statement')
TO 's3://object-path/name-prefix'
FORMAT PARQUET

It is mentioned in Redshift Features

and also updated in Unload Document

with an example provided in the Unload Examples Document

Excerpt of the official documentation:

The following example unloads the LINEITEM table in Parquet format, partitioned by the l_shipdate column.

unload ('select * from lineitem')
to 's3://mybucket/lineitem/'
iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
PARQUET
PARTITION BY (l_shipdate);

Assuming four slices, the resulting Parquet files are dynamically partitioned into various folders.

s3://mybucket/lineitem/l_shipdate=1992-01-02/0000_part_00.parquet
                                             0001_part_00.parquet
                                             0002_part_00.parquet
                                             0003_part_00.parquet
s3://mybucket/lineitem/l_shipdate=1992-01-03/0000_part_00.parquet
                                             0001_part_00.parquet
                                             0002_part_00.parquet
                                             0003_part_00.parquet
s3://mybucket/lineitem/l_shipdate=1992-01-04/0000_part_00.parquet
                                             0001_part_00.parquet
                                             0002_part_00.parquet
                                             0003_part_00.parquet

146

answered Oct 18 '22 23:10

secdatabase

You can't do this. Redshift doesn't know about Parquet (although you can read Parquet files through the Spectrum abstraction).

You can UNLOAD to text files. They can be encrypted or zipped, but they are only ever flat text files.

Looks like this is now supported:

https://aws.amazon.com/about-aws/whats-new/2018/06/amazon-redshift-can-now-copy-from-parquet-and-orc-file-formats/

answered Oct 19 '22 00:10

Kirk Broadhurst

Related questions
                            
                                How to do Real-time loading into Amazon Redshift?
                            
                                Redshift : defining composite primary key
                            
                                generate_series function in Amazon Redshift
                            
                                Retrieve inserted identity value from AWS Redshift via JDBC
                            
                                How to save a Redshift SELECT atribute into a script variable [duplicate]
                            
                                AWS Glue ETL job from AWS Redshift to S3 fails
                            
                                Loading CSV to Redshift: "Missing newline: Unexpected character found at location 2"
                            
                                Optimize large IN condition for Redshift query
                            
                                Redshift - Many Columns to Rows (Unpivot)
                            
                                Redshift: Executing a dynamic query from a string
                            
                                Redshift Spectrum: Automatically partition tables by date/folder
                            
                                Redshift COPY operation doesn't work in SQLAlchemy
                            
                                How to create and call temp table in redshift
                            
                                AWS Redshift - Add IDENTITY column to existing table
                            
                                Redshift: Truncate VARCHAR value automatically on INSERT or maybe use max length?
                            
                                AWS Redshift: Masteruser not authorized to assume role
                            
                                Amazon RedShift Copy Command
                            
                                S3 -> Redshift cannot handle UTF8
                            
                                How to save Amazon Redshift output to local CSV through SQL Workbench?
                            
                                Redshift/Postgres: how can I ignore rows that generate errors? (Invalid JSON in json_extract_path_text)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Offloading data files from Amazon Redshift to Amazon S3 in Parquet format

Tags:

parquet

amazon-redshift

amazon-athena

amazon-redshift-spectrum

Teja

People also ask

2 Answers

secdatabase

Kirk Broadhurst

Recent Activity

Donate For Us