Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy csv data file to Amazon RedShift?

I'm trying to migrating some MySQL tables to Amazon Redshift, but met some problems.

The steps are simple: 1. Dump the MySQL table to a csv file 2. Upload the csv file to S3 3. Copy the data file to RedShift

Error occurs in step 3:

The SQL command is:

copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' csv;

The error info:

An error occurred when executing the SQL command: copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx ERROR: COPY CSV is not supported [SQL State=0A000] Execution time: 0.53s 1 statement(s) failed.

I don't know if there's any limitations on the format of the csv file, say the delimiters and quotes, I cannot find it in documents.

Any one can help?

like image 978
ciphor Avatar asked Mar 07 '13 02:03

ciphor


People also ask

How do I import an Excel file into Redshift?

You can use Get & Transform (Power Query) to connect to Amazon Redshift from Excel with ODBC. This method assumes that you've installed an ODBC driver for Amazon Redshift. Click the Data in Excel, then expand the Get Data drop-down list. Click From Other Sources > From ODBC.

Where is Redshift Copy command?

The COPY command is an extension of SQL supported by Redshift. Therefore, the COPY command needs to be issued from an SQL client. You mention that you have configured SQL Workbench. Once you connect to the Redshift cluster, run the command from within that connection.


4 Answers

The problem is finally resolved by using:

copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' delimiter ',' removequotes;

More information can be found here http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

like image 166
ciphor Avatar answered Nov 02 '22 04:11

ciphor


Now Amazon Redshift supports CSV option for COPY command. It's better to use this option to import CSV formatted data correctly. The format is shown bellow.

COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV;

The default delimiter is ( , ) and the default quotes is ( " ). Also you can import TSV formatted data with CSV and DELIMITER option like this.

COPY [table-name] FROM 's3://[bucket-name]/[file-path or prefix]'
CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' CSV DELIMITER '\t';

There are some disadvantages to use the old way(DELIMITER and REMOVEQUOTES) that REMOVEQUOTES does not support to have a new line or a delimiter character within an enclosed filed. If the data can include this kind of characters, you should use CSV option.

See the following link for the details.

http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

like image 36
Masashi M Avatar answered Nov 02 '22 05:11

Masashi M


If you want to save your self some code/ you have a very basic use case you can use Amazon Data Pipeline. it stats a spot instance and perform the transformation within amazon network and it's really intuitive tool (but very simple so you can't do complex things with it)

like image 22
asafm Avatar answered Nov 02 '22 05:11

asafm


You can try with this

copy TABLE_A from 's3://ciphor/TABLE_A.csv' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' csv;

CSV itself means comma separated values, no need to provide delimiter with this. Please refer link.

[http://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-format]

like image 38
Dipesh Palod Avatar answered Nov 02 '22 06:11

Dipesh Palod