Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redshift: serializable isolation error (1023) despite LOCK

I am running several batch ETL operations in parallel on my Redshift cluster.

My pipeline does the following:

Do a a bunch of stuff on temporary staging table. At the end, upsert into the final table (permanent and shared across process) by doing:

BEGIN;
LOCK table X;
DELETE FROM X USING stage_table...
INSERT INTO X ...
END;

Still, when I have several process in parallel, some fail with:

ERROR: 1023 DETAIL: Serializable isolation violation on table - 142443, transactions > forming the cycle are: 388224, 388226 (pid:32012)

(where 142443 is my table X)

When I run the process one-by-one everything works like a charm. I've used the lock with success on other processes (and verified that it worked as intended) so I'm puzzled here. Any help appreciated!

like image 379
pcothenet Avatar asked May 15 '15 17:05

pcothenet


1 Answers

This is expected. The transaction isolation level used by Redshift is SERIALIZABLE as clearly stated in the AWS Doc

Note: READ UNCOMMITTED, READ COMMITTED, and REPEATABLE READ have no operational impact and map to SERIALIZABLE in Amazon Redshift.

Concretely speaking this means that if you run SQL statements in parallel that are not SERIALIZABLE (can be run in any order with no result difference), you will get the isolation level error.

By the way, Redshift give you tools to figure out which queries are conflicting. Using the numbers you get in the above log message you can query as follows:

select query, trim(querytxt) as sqlquery from stl_query where xid = 388224;

388224 the transaction_id that form the cycle.

like image 74
isaac.hazan Avatar answered Sep 29 '22 01:09

isaac.hazan