A Phantom read occurs when one user is repeating a read operation on the same records, but has new records in the results set: READ UNCOMMITTED. Also called a Dirty read. When this isolation level is used, a transaction can read uncommitted data that later may be rolled back.
Nonrepeatable Reads A nonrepeatable read occurs when a transaction reads the same row twice but gets different data each time. For example, suppose transaction 1 reads a row. Transaction 2 updates or deletes that row and commits the update or delete.
A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first. This can occur when range locks are not acquired on performing a SELECT.
The REPEATABLE READ transaction will still see the same data, while the READ COMMITTED transaction will see the changed row count. REPEATABLE READ is really important for reporting because it is the only way to get a consistent view of the data set even while it is being modified.
From Wikipedia (which has great and detailed examples for this):
A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
and
A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.
Simple examples:
select sum(x) from table;
will return a different result even if none of the affected rows themselves have been updated, if rows have been added or deleted.In the above example,which isolation level to be used?
What isolation level you need depends on your application. There is a high cost to a "better" isolation level (such as reduced concurrency).
In your example, you won't have a phantom read, because you select only from a single row (identified by primary key). You can have non-repeatable reads, so if that is a problem, you may want to have an isolation level that prevents that. In Oracle, transaction A could also issue a SELECT FOR UPDATE, then transaction B cannot change the row until A is done.
A simple way I like to think about it is:
Both non-repeatable and phantom reads have to do with data modification operations from a different transaction, which were committed after your transaction began, and then read by your transaction.
Non-repeatable reads are when your transaction reads committed UPDATES from another transaction. The same row now has different values than it did when your transaction began.
Phantom reads are similar but when reading from committed INSERTS and/or DELETES from another transaction. There are new rows or rows that have disappeared since you began the transaction.
Dirty reads are similar to non-repeatable and phantom reads, but relate to reading UNCOMMITTED data, and occur when an UPDATE, INSERT, or DELETE from another transaction is read, and the other transaction has NOT yet committed the data. It is reading "in progress" data, which may not be complete, and may never actually be committed.
The Non-Repeatable Read anomaly looks as follows:
The Phantom Read anomaly can happen as follows:
So, while the Non-Repeatable Read applies to a single row, the Phantom Read is about a range of records which satisfy a given query filtering criteria.
UPDATE
query from another transactionINSERT
or DELETE
query from another transactionNote : DELETE statements from another transaction, also have a very low probability of causing Non-repeatable reads in certain cases. It happens when the DELETE statement unfortunately, removes the very same row which your current transaction was querying. But this is a rare case, and far more unlikely to occur in a database which have millions of rows in each table. Tables containing transaction data usually have high data volume in any production environment.
Also we may observe that UPDATES may be a more frequent job in most use cases rather than actual INSERT or DELETES (in such cases, danger of non-repeatable reads remain only - phantom reads are not possible in those cases). This is why UPDATES are treated differently from INSERT-DELETE and the resulting anomaly is also named differently.
There is also an additional processing cost associated with handling for INSERT-DELETEs, rather than just handling the UPDATES.
Then why not just set the transaction SERIALIZABLE at all times? Well, the answer to the above question is: SERIALIZABLE setting makes transactions very slow, which we again don't want.
In fact transaction time consumption is in the following rate:
SERIALIZABLE > REPEATABLE_READ > READ_COMMITTED > READ_UNCOMMITTED
So READ_UNCOMMITTED setting is the fastest.
Actually we need to analyze the use case and decide an isolation level so that we optimize the transaction time and also prevent most anomalies.
Note that databases by default may have REPEATABLE_READ setting. Admins and architects may have an affinity towards choosing this setting as default, to exhibit better performance of the platform.
There is a difference in the implementation between these two kinds isolation levels.
For "non-repeatable read", row-locking is needed.
For "phantom read",scoped-locking is needed, even a table-locking.
We can implement these two levels by using two-phase-locking protocol.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With