Say I'm working on an ETL that periodically reads source data from a CSV and replaces a table in Postgres (9.6).
Here is one approach:
BEGIN
DROP TABLE IF EXISTS table
CREATE TABLE table (...)
COPY table FROM STDIN (FORMAT csv, HEADER true)
CREATE INDEX ON table (...)
COMMIT
Here is another:
BEGIN
CREATE TABLE table_temp (...)
COPY table_temp FROM STDIN (FORMAT csv, HEADER true)
CREATE INDEX ON table_temp (...)
DROP TABLE IF EXISTS table
ALTER TABLE table_temp RENAME TO table
COMMIT
Am I correct about the following?
DROP
command, so concurrent SELECT
queries will block until the transaction is complete.SELECT
queries until after the COPY
and CREATE INDEX
finish.ROLLBACK
instead of COMMIT
, the table will be restored with all original data and indexes.Also, aside from the name of the index, are there any other functional differences?
Some info for people who think that renaming a table will be sufficient for their use case. Beware! Renaming in PostgreSQL may not work as you expected sometimes.
More info here What could go wrong, when swapping table contents using table rename in Postgresql?
Yes, all 3 of your assumptions are correct.
The end result would be the same in case of both transactions but the blocking caused by the first transaction would be longer and depend on the amount of data and the time it takes to create the index. The 2nd transaction would require a very brief lock to rename the objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With