Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling race conditions in PostgreSQL

I have several workers, each holding its own connection to PostgreSQL. The workers manipulate with different tables.

The workers handle parallel requests from outside the system. One of the tables being accessed is the table of users. When some information comes, I first need to ensure there is a record for the user in the table. If there is no record, I wish to create one at first.

I'm using the following idiom:

if [user does not exist] then [create user]

The code of [user does not exist] is:

SELECT id FROM myschema.users WHERE userId='xyz'

and I test whether any row is returned.

The (simplified) code of [create user] is:

INSERT INTO myschema.users VALUES ('xyz')

When my system handles parallel streams of different information concerning the same user, I often get PostgreSQL error:

Key (id)=(xyz) already exists

It happens because the SELECT command returns no rows, then another worker creates the user, any my worker attempts to do the same, resulting in exemplary concurrency error.

According to PostgreSQL documentation, by default, whenever I implicitly start a transaction, the table becomes locked for as long as I don't commit it. I'm not using autocommit and I only commit the transaction in blocks, e.g. after the whole if-else block.

Indeed, I could put the if-else stuff into SQL directly, but it does not solve my problem of locking in general. I was supposing that "the winner takes it all" paradigm will work, and that the first worker which manages to execute the SELECT command will own the locks until it calls COMMIT.

I've read many different topics here at SO, but I'm still not sure what the right solution is. Should I use explicit locking of tables, because the implicit locking does not work? How can I ensure that only single worker owns a table at time?

like image 871
Tregoreg Avatar asked Jan 30 '14 18:01

Tregoreg


2 Answers

You have to care about the transaction isolation level. It should be set to "SERIALIZABLE".

The reason are Phantom Reads - The transaction doesn't lock the whole table, but only the rows which have already been read by the transaction.

So, if another transaction inserts new data, they haven't been locked yet, and the error appears.

Serializable avoids this, by blocking all other transactions, until this one finished.

You can do this via

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

The documentations: http://www.postgresql.org/docs/9.1/static/transaction-iso.html

If you want to learn more about this topic, I can highly recommend you this video: http://www.youtube.com/watch?v=zz-Xbqp0g0A

like image 115
maja Avatar answered Oct 11 '22 22:10

maja


Actually, after some messing with ISOLATION LEVEL SERIALIZABLE as proposed by @maja, I've discovered much simpler mechanism:

PERFORM pg_advisory_lock(id);
...
# do something that others must wait for
...
PERFORM pg_advisory_unlock(id);

where id is a BIGINT value which I may choose arbitrarily according to my application's logic.

This gave me both the power and the flexibility I was looking for.

like image 27
Tregoreg Avatar answered Oct 11 '22 22:10

Tregoreg