Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to safely unload/copy a table in RedShift?

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.

For example, when I use the following command to unload/copy a table:

unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;

copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;

I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.

I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.

like image 659
ciphor Avatar asked Oct 16 '14 17:10

ciphor


People also ask

How do you unload a table in Redshift?

Unload the VENUE table to a JSON file. The following example unloads the VENUE table and writes the data in JSON format to s3://mybucket/unload/ . unload ('select * from venue') to 's3://mybucket/unload/' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole' JSON; Following are sample rows from the VENUE table.

How do I export a Redshift table?

The basic syntax to export your data is as below. UNLOAD ('SELECT * FROM your_table') TO 's3://object-path/name-prefix' IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>' CSV; On the first line, you query the data you want to export. Be aware that Redshift only allows a LIMIT clause in an inner SELECT statement.

Can we unload a view in Redshift?

You can unload the result of an Amazon Redshift query to your Amazon S3 data lake in Apache Parquet, an efficient open columnar storage format for analytics. Parquet format is up to 2x faster to unload and consumes up to 6x less storage in Amazon S3, compared with text formats.

Can you remove 000 file extension after a CSV unload from Amazon Redshift to s3?

Yes, because thats the naming convention unload uses in order to avoid duplicate names which I believe cannot be avoided from our end.


2 Answers

Finally I figured out the right approach, to add escape in both unload and copy command:

unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;

copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;

With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:

  • Linefeed: \n
  • Carriage return: \r
  • The delimiter character specified for the unloaded data.
  • The escape character: \
  • A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified in the UNLOAD command).

And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.

like image 62
ciphor Avatar answered Oct 10 '22 23:10

ciphor


Try unload like below

 unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape

To load it back use as below

copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;

This will work irrespective of your data might have , in between.

like image 31
Sandesh Deshmane Avatar answered Oct 10 '22 21:10

Sandesh Deshmane