Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Redshift how to copy from s3 and set a job_id

Amazon Redshift provides the ability to load table data from s3 objects using the "Copy" command. Is their a way to use the copy command, but also set additional "col=CONSTANT" for each inserted row.

I want to set a job_id (which is not in the source data) on each copied row, and I think it would be a shame to have to execute a few million inserts just so each row has a job attribute, when "copy" gets me 99% of the way there with much better performance.

Maybe there is a more clever solution?

like image 928
gbegley Avatar asked May 13 '13 14:05

gbegley


1 Answers

If you want all your rows added in a single COPY command to have the same value of job_id, then you may COPY data into staging table, then add job_id column into that table, then insert all data from the staging table into final table like:

CREATE TABLE destination_staging (LIKE destination);
ALTER TABLE destination_staging DROP COLUMN job_id;
COPY destination_staging FROM 's3://data/destination/(...)' (...)
ALTER TABLE destination_staging ADD COLUM job_id INT DEFAULT 42;
INSERT INTO destination SELECT * FROM destination_staging ORDER BY sortkey_column;
DROP TABLE destination_staging;
ANALYZE TABLE destination;
VACUUM destination;

ANALYZE and VACUUM are not necessary, but highly recommended in order to update query analyzer and put all new data into correct positions.

like image 83
Tomasz Tybulewicz Avatar answered Oct 11 '22 07:10

Tomasz Tybulewicz