Amazon Redshift provides the ability to load table data from s3 objects using the "Copy" command. Is their a way to use the copy command, but also set additional "col=CONSTANT" for each inserted row.
I want to set a job_id (which is not in the source data) on each copied row, and I think it would be a shame to have to execute a few million inserts just so each row has a job attribute, when "copy" gets me 99% of the way there with much better performance.
Maybe there is a more clever solution?
If you want all your rows added in a single COPY command to have the same value of job_id, then you may COPY data into staging table, then add job_id column into that table, then insert all data from the staging table into final table like:
CREATE TABLE destination_staging (LIKE destination);
ALTER TABLE destination_staging DROP COLUMN job_id;
COPY destination_staging FROM 's3://data/destination/(...)' (...)
ALTER TABLE destination_staging ADD COLUM job_id INT DEFAULT 42;
INSERT INTO destination SELECT * FROM destination_staging ORDER BY sortkey_column;
DROP TABLE destination_staging;
ANALYZE TABLE destination;
VACUUM destination;
ANALYZE and VACUUM are not necessary, but highly recommended in order to update query analyzer and put all new data into correct positions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With