I'm writing an ETL process in redshift and I would really like for it to be more parallelizable. The process creates a bunch of temporary tables, and then merges them together into the main table, inside a transaction that looks like this:
CREATE TEMPORARY TABLE temporary_table
(LIKE production_table);
BEGIN TRANSACTION;
LOCK TABLE production_table;
/* Dedupe and then insert */
DELETE FROM temporary_table
USING production_table
WHERE temporary_table.id = production_table.id
AND temporary_table.date <= production_table.date;
INSERT INTO production_table
SELECT * FROM temporary_table;
END TRANSACTION;
I expect the LOCK TABLE production_table; query to block until any other transactions finish, but whenever I try to run this in practice, I get:
ERROR: deadlock detected
DETAIL: Process 4868 waits for AccessExclusiveLock on relation 165643 of database 154587; blocked by process 4863.
Process 4863 waits for AccessExclusiveLock on relation 165643 of database 154587; blocked by process 4868.
This happens between the two delete/insert statements. Why does the lock table step not prevent deadlocks? How can I structure my writes so that they block until the other transaction is finished?
I have figured this out.
There is an implicit lock on production_table whenever you do
CREATE TEMPORARY TABLE temporary_table
(LIKE production_table)
for the lifetime of the temporary table. I used the actual schema of the production table in the create temporary table query, and the deadlocks went away.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With