I have several workers, each holding its own connection to PostgreSQL. The workers manipulate with different tables.
The workers handle parallel requests from outside the system. One of the tables being accessed is the table of users. When some information comes, I first need to ensure there is a record for the user in the table. If there is no record, I wish to create one at first.
I'm using the following idiom:
if [user does not exist] then [create user]
The code of [user does not exist]
is:
SELECT id FROM myschema.users WHERE userId='xyz'
and I test whether any row is returned.
The (simplified) code of [create user]
is:
INSERT INTO myschema.users VALUES ('xyz')
When my system handles parallel streams of different information concerning the same user, I often get PostgreSQL error:
Key (id)=(xyz) already exists
It happens because the SELECT
command returns no rows, then another worker creates the user, any my worker attempts to do the same, resulting in exemplary concurrency error.
According to PostgreSQL documentation, by default, whenever I implicitly start a transaction, the table becomes locked for as long as I don't commit it. I'm not using autocommit and I only commit the transaction in blocks, e.g. after the whole if-else
block.
Indeed, I could put the if-else
stuff into SQL directly, but it does not solve my problem of locking in general. I was supposing that "the winner takes it all" paradigm will work, and that the first worker which manages to execute the SELECT
command will own the locks until it calls COMMIT
.
I've read many different topics here at SO, but I'm still not sure what the right solution is. Should I use explicit locking of tables, because the implicit locking does not work? How can I ensure that only single worker owns a table at time?
You have to care about the transaction isolation level. It should be set to "SERIALIZABLE
".
The reason are Phantom Reads
- The transaction doesn't lock the whole table, but only the rows which have already been read by the transaction.
So, if another transaction inserts new data, they haven't been locked yet, and the error appears.
Serializable avoids this, by blocking all other transactions, until this one finished.
You can do this via
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
The documentations: http://www.postgresql.org/docs/9.1/static/transaction-iso.html
If you want to learn more about this topic, I can highly recommend you this video: http://www.youtube.com/watch?v=zz-Xbqp0g0A
Actually, after some messing with ISOLATION LEVEL SERIALIZABLE
as proposed by @maja, I've discovered much simpler mechanism:
PERFORM pg_advisory_lock(id);
...
# do something that others must wait for
...
PERFORM pg_advisory_unlock(id);
where id
is a BIGINT
value which I may choose arbitrarily according to my application's logic.
This gave me both the power and the flexibility I was looking for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With