What is the difference between Non-Repeatable Read and Phantom Read?

Multi tool use
What is the difference between Non-Repeatable Read and Phantom Read?
What is the difference between non-repeatable read and phantom read?
I have read the Isolation (database systems) article from Wikipedia, but I have a few doubts. In the below example, what will happen: the non-repeatable read and phantom read?
SELECT ID, USERNAME, accountno, amount FROM USERS WHERE ID=1
1----MIKE------29019892---------5000
UPDATE USERS SET amount=amount+5000 where ID=1 AND accountno=29019892;
COMMIT;
SELECT ID, USERNAME, accountno, amount FROM USERS WHERE ID=1
Another doubt is, in the above example, which isolation level should be used? And why?
8 Answers
8
From Wikipedia (which has great and detailed examples for this):
A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.
and
A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.
Simple examples:
select sum(x) from table;
In the above example,which isolation level to be used?
What isolation level you need depends on your application. There is a high cost to a "better" isolation level (such as reduced concurrency).
In your example, you won't have a phantom read, because you select only from a single row (identified by primary key). You can have non-repeatable reads, so if that is a problem, you may want to have an isolation level that prevents that. In Oracle, transaction A could also issue a SELECT FOR UPDATE, then transaction B cannot change the row until A is done.
I don't really understand the logic of such a syntax... A NON-repeatable read occurs when the read is repeated (and a different value obtained)??!...
– serhio
Apr 15 '14 at 9:30
@serhio "non-repeatable" refers to the fact that you can read a value once and get x as the result, and then read again and get y as the result, so you cannot repeat (non-repeatable) the same results from two separate queries of the same row, because that row value was updated in between reads.
– BateTech
Sep 9 '15 at 1:09
@Thilo Any real use case example where repeatable-read might create issues and where it is necessary ?
– user104309
Aug 18 '17 at 21:13
What if the PK is modified in another transaction? Could that result in a phantom read? (A strange thing to do in most cases, but not impossible.)
– jpmc26
May 13 at 13:25
A simple way I like to think about it is:
Both non-repeatable and phantom reads have to do with data modification operations from a different transaction, which were committed after your transaction began, and then read by your transaction.
Non-repeatable reads are when your transaction reads committed UPDATES from another transaction. The same row now has different values than it did when your transaction began.
Phantom reads are similar but when reading from committed INSERTS and/or DELETES from another transaction. There are new rows or rows that have disappeared since you began the transaction.
Dirty reads are similar to non-repeatable and phantom reads, but relate to reading UNCOMMITTED data, and occur when an UPDATE, INSERT, or DELETE from another transaction is read, and the other transaction has NOT yet committed the data. It is reading "in progress" data, which may not be complete, and may never actually be committed.
I am amazed (or say I don't believe). Dirty read is: Reading a modified data (by another transaction) which actually is not committed or may never be committed. Where possibly this scenario could be used? (Its not I don't believe you - Actually I haven't grasped the concept properly) - Please do not mind (You might be able to understand newbie Frustration )
– PHP Avenger
Sep 8 '15 at 21:57
It has to do with transaction isolation levels and concurrency. Using the default isolation level, you will not get dirty reads, and in most cases, you want to avoid dirty reads. There are isolation levels or query hints that will allow dirty reads, which in some cases is an acceptable trade off in order to achieve higher concurrency or is necessary due to an edge case, such as troubleshooting an in progress transaction from another connection. It is good that the idea of a dirty read doesn't pass the "smell test" for you, bc as a general rule, they should be avoided, but do have a purpose.
– BateTech
Sep 8 '15 at 23:47
What if a delete is followed by an insert, couldn't that be used to effectively achieve an update?
– Kevin Wheeler
Jul 27 '16 at 22:36
@BateTech UPDATE or DELETE both can take place for Non-repeatable reads or it is only UPDATE?
– Dipon Roy
Apr 23 at 6:52
@DiponRoy great question. The locking implemented if using repeatable read (RR) isolation should prevent deletes from occurring on rows that have been selected. I've seen varying definitions of the 2 iso levels over the years, mainly saying phantom is a change in the collection/# rows returned and RR is the same row being changed. I just checked the updated MS SQL documentation says that deletes can cause non-RR (docs.microsoft.com/en-us/sql/odbc/reference/develop-app/… ) so I think it would be safe to group deletes in the RR category too
– BateTech
Apr 26 at 19:08
There is a difference in the implementation between these two kinds isolation levels.
For "non-repeatable read", row-locking is needed.
For "phantom read",scoped-locking is needed, even a table-locking.
We can implement these two levels by using two-phase-locking protocol.
To implement repeatable read or serializable, there is no need to use row-locking.
– a_horse_with_no_name
Aug 2 '17 at 20:53
In a system with non-repeatable reads, the result of Transaction A's second query will reflect the update in Transaction B - it will see the new amount.
In a system that allows phantom reads, if Transaction B were to insert a new row with ID = 1, Transaction A will see the new row when the second query is executed; i.e. phantom reads are a special case of non-repeatable read.
I don't think the explanation of a phantom read is correct. You can get phantom reads even if non-commit data is never visible. See the example on Wikipedia (linked in the comments above).
– Thilo
Jun 15 '12 at 5:14
Dirty read : read UNCOMMITED data from anouther transaction.
Non-repeatable read : Read COMMITED data from an UPDATE query from anouther transaction.
Phantom read : Read COMMITED data from an INSERT or DELETE query from anouther transaction.
Note here that UPDATES may be a more frequent job in certain usecases rather than actual INSERT or DELETES - in such cases , danger of Non-repeatable reads remain only- phantom reads are not possible in those cases. This why UPDATES are treated differently from INSERT-DELETE and the concerned anomaly is also named differently.
There is also an additional processing cost associated with handling for INSERT-DELETES , rather than just handle the UPDATES.
Isolation level TRANSACTION_READ_UNCOMMITTED prevents nothing. Its the zero isolation level.
Isolation level TRANSACTION_READ_COMMITTED prevents just one, ie. Dirty reads.
Isolation level TRANSACTION_REPEATABLE_READ prevents two anomalies : Dirty reads and Non-repeatable reads.
Isolation level TRANSACTION_SERIALIZABLE prevents all three anomalies : Dirty reads, Non-repeatable reads and Phantom reads.
Then why not just set the transaction SERIALIZABLE at all times ??
Well , the answer to the above question is : SERIALIZABLE setting makes transactions very slow , which we again don't want.
In fact transaction time consumtion is in the following rate :
SERIALIZABLE > REPEATABLE_READ > READ_COMMITTED > READ_UNCOMMITTED .
So READ_UNCOMMITTED setting is the fastest .
Actually we need to analyze the usecase and decide an isolation level so that we optimize the transaction time and also prevent most anomalies.
Note that Databases by default have REPEATABLE_READ setting.
UPDATE or DELETE both can take place for Non-repeatable reads or it is only UPDATE?
– Dipon Roy
Apr 23 at 6:53
The accepted answer indicates most of all that the so-called distinction between the two is actually not significant at all.
If "a row is retrieved twice and the values within the row differ between reads", then they are not the same row (not the same tuple in correct RDB speak) and it is then indeed by definition also the case that "the collection of rows returned by the second query is different from the first".
As to the question "which isolation level should be used", the more your data is of vital importance to someone, somewhere, the more it will be the case that Serializable is your only reasonable option.
As explained in this article, the Non-Repeatable Read anomaly looks as follows:
In this article about Phantom Read, you can see that this anomaly can happen as follows:
So, while the Non-Repeatable Read applies to a single row, the Phantom Read is about a range of records which satisfy a given query filtering criteria.
I think there are some difference between Non-repeateable-read & phantom-read.
The Non-repeateable means there are tow transaction A & B. if B can notice the modification of A, so maybe happen dirty-read, so we let B notices the modification of A after A committing.
There is new issue: we let B notice the modification of A after A committing, it means A modify a value of row which the B is holding, sometime B will read the row again, so B will get new value different with first time we get, we call it Non-repeateable, to deal with the issue, we let the B remember something(cause i don't know what will be remembered yet) when B start.
Let's think about the new solution, we can notice there is new issue as well, cause we let B remember something, so whatever happened in A, the B can't be affected, but if B want to insert some data into table and B check the table to make sure there is no record, but this data has been inserted by A, so maybe occur some error. We call it Phantom-read.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
en.wikipedia.org/wiki/Isolation_(database_systems)
– Pavel Veller
Jun 15 '12 at 1:59