I am reading through section 13.2 of the PostgreSQL Manual, but the textual descriptions found there are not enough clear, and lack examples.
For instance the following two paragraphs are not clear to whom is learning PostgreSQL:
INSERT with an ON CONFLICT DO UPDATE clause behaves similarly. In Read Committed mode, each row proposed for insertion will either insert or update. Unless there are unrelated errors, one of those two outcomes is guaranteed. If a conflict originates in another transaction whose effects are not yet visible to the INSERT, the UPDATE clause will affect that row, even though possibly no version of that row is conventionally visible to the command."
and
The Repeatable Read mode provides a rigorous guarantee that each transaction sees a completely stable view of the database. However, this view will not necessarily always be consistent with some serial (one at a time) execution of concurrent transactions of the same level. For example, even a read only transaction at this level may see a control record updated to show that a batch has been completed but not see one of the detail records which is logically part of the batch because it read an earlier revision of the control record.
Can someone give examples clarifying what is in these two paragraphs?
Does anyone know where can I find a formal description of the behavior of PostgreSQL isolation levels? I am looking for this because it is an advanced topic which I believe a formal description would help clarify how it works, and thus, help avoid concurrency bugs between transactions.
UPDATE: Another doubt I have is how does a serializable transaction is handled in terms of how the database machinery decides to commit or abort it, when it can run concurrently with other transactions at other isolation levels? Does the database decides on the result of the serializable transaction as if the other transactions were ran with serializable isolation too?
Thanks
UPDATE 2: So far the best I found regarding implementation details of isolation levels is the PostgreSQL Wiki Serializable Page.
Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions.
The highest isolation level, serializable, guarantees that a transaction will retrieve exactly the same data every time it repeats a read operation. But it uses a level of locking that is likely to impact other users in multi-user systems.
Dirty read doesn't occur in PostgreSQL even the transaction isolation level is READ UNCOMMITTED . READ UNCOMMITTED has the same characteristics of READ COMMITTED in PostgreSQL different from other databases so in short, READ UNCOMMITTED and READ COMMITTED are the same in PostgreSQL.
READ COMMITTED
: each SQL statement takes a new snapshot of the database, so every statement will always see changes made by concurrent transactions in the mean time as soon as they are committed. No serialization errors can occur.
REPEATABLE READ
: the first statement in the transaction takes a snapshot of the database that is retained for the whole transaction, so all statements see the same state of the database. Serialization errors can occur if you try to modify a row that has been modified by a concurrent transaction after your snapshot was taken. This isolation level is no more expensive than READ COMMITTED
.
SERIALIZABLE
: Any transaction that may lead to an outcome that is not consistent with some serial execution order of the transactions will be aborted with a serialization error. There can be false positives. This isolation level is more expensive than the others.
Answers to the specific questions:
INSERT ... ON CONFLICT
in read committed isolation:
If transaction 1 has inserted a row, but not yet committed, transaction 2 running INSERT ... ON CONFLICT
will wait until transaction 1 has committed or rolled back and then update or insert as appropriate. No constraint violation can occur.
Batch jobs and REPEATABLE READ
:
This paragraph is dark; ignore it. It tries to illustrate that two concurrent repeatable read translations can produce a result that is not consistent with any serial execution.
A better example might be two concurrent transactions that both read the same data and perform updates on them based on the read results. Each of these transactions cannot see the modifications of the other one.
Look into the PostgreSQL Wiki under "serializable" for more detailed examples.
Update question:
This question is not quite clear to me.
Serializable transactions take special "SI" locks which track read and write access and survive a commit. They don't block other sessions, but are used to determine if there may be a conflict. Serializable isolation level only works properly if all concurrent transactions use the serializable isolation level.
Regarding the question in UPDATE
Does the database decides on the result of the serializable transaction as if the other transactions were ran with serializable isolation too?"
the answer to is NO.
Serializability is only verified between concurrent transactions at serializable isolation level. For example, given two transactions T1 and T2 interleaved like this:
T1: begin
T1: set transaction isolation level read committed;
T1: update addresses set street = 'Sun street' where id = 1
T2: begin
T2: set transaction isolation level serializable;
T2: select street from addresses where id = 1
T2: update addresses set street = 'Sea street' where id = 2
T1: select street from addresses where id = 2
T1: commit
T2: commit
both T1 and T2 will commit. However, if T1 is set to serializable isolation then T2 will abort.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With