Is 'copy' command in Amazon RedShift atomic or not?

Question

For Amazon RedShift, usually data are loaded from S3 using 'copy' command. I want to know if the command is atomic or not. E.g. is it possible that in some exceptional cases that only part of the data file is loaded into RedShift table?

Masashi M · Accepted Answer

The COPY command with default options is atomic. If the file includes an invalid line that can cause a load failure, the COPY transaction will be rollbacked and no data is imported.

If you want to skip invalid lines and not to stop the transaction, you can use the MAXERROR option for COPY command that ignores invalid lines. Here is the example that ignores up to 100 invalid lines.

COPY table_name from 's3://[bucket-name]/[file-path or prefix]' CREDENTIALS 'aws_access_key_id=xxxx;aws_secret_access_key=xxxx' DELIMITER '	' MAXERROR 100;

If the number of invalid lines is more than MAXERROR error count(100), the transaction will be rollbacked.

See the following link for the details of COPY command. http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Guy · Answer

You can use the flag of NOLOAD to check for errors before loading the data. This is a faster way to validate the format of your data as it doesn't try to load any data, just parse it.

You can define how many errors you are willing to tolerate with MAXERROR flag

If you have more than the MAXERROR count, your load will fail and no record is added.

See more information here: http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Is 'copy' command in Amazon RedShift atomic or not?

Tags:

amazon-redshift

ciphor

2 Answers

Masashi M

Guy

Recent Activity

Donate For Us

Is 'copy' command in Amazon RedShift atomic or not?

Tags:

amazon-redshift

ciphor

2 Answers

Masashi M

Guy

Related questions

Recent Activity

Donate For Us