Amazon Redshift how to copy from s3 and set a job_id

Question

Amazon Redshift provides the ability to load table data from s3 objects using the "Copy" command. Is their a way to use the copy command, but also set additional "col=CONSTANT" for each inserted row.

I want to set a job_id (which is not in the source data) on each copied row, and I think it would be a shame to have to execute a few million inserts just so each row has a job attribute, when "copy" gets me 99% of the way there with much better performance.

Maybe there is a more clever solution?

Tomasz Tybulewicz · Accepted Answer

If you want all your rows added in a single COPY command to have the same value of job_id, then you may COPY data into staging table, then add job_id column into that table, then insert all data from the staging table into final table like:

CREATE TABLE destination_staging (LIKE destination);
ALTER TABLE destination_staging DROP COLUMN job_id;
COPY destination_staging FROM 's3://data/destination/(...)' (...)
ALTER TABLE destination_staging ADD COLUM job_id INT DEFAULT 42;
INSERT INTO destination SELECT * FROM destination_staging ORDER BY sortkey_column;
DROP TABLE destination_staging;
ANALYZE TABLE destination;
VACUUM destination;

ANALYZE and VACUUM are not necessary, but highly recommended in order to update query analyzer and put all new data into correct positions.

Amazon Redshift how to copy from s3 and set a job_id

Tags:

postgresql

amazon-web-services

amazon-redshift

gbegley

1 Answers

Tomasz Tybulewicz

Recent Activity

Donate For Us

Amazon Redshift how to copy from s3 and set a job_id

Tags:

postgresql

amazon-web-services

amazon-redshift

gbegley

1 Answers

Tomasz Tybulewicz

Related questions

Recent Activity

Donate For Us