To query top-n rows in Oracle, it is general to use ROWNUM. So the following query seems ok (gets most recent 5 payments):
select a.paydate, a.amount
from (
select t.paydate, t.amount
from payments t
where t.some_id = id
order by t.paydate desc
) a
where rownum <= 5;
But for very big tables, it is inefficient - for me it run for ~10 minutes. So I tried other queries, and I ended up with this one which runs for less than a second:
select *
from (
select a.*, rownum
from (select t.paydate, t.amount
from payments t
where t.some_id = id
order by t.paydate desc) a
)
where rownum <= 5;
To find out what is happening, I looked execution plans for each query. For first query:
SELECT STATEMENT, GOAL = ALL_ROWS 7 5 175
COUNT STOPKEY
VIEW 7 5 175
TABLE ACCESS BY INDEX ROWID 7 316576866 6331537320
INDEX FULL SCAN DESCENDING 4 6
And for second:
SELECT STATEMENT, GOAL = ALL_ROWS 86 5 175
COUNT STOPKEY
VIEW 86 81 2835
COUNT
VIEW 86 81 1782
SORT ORDER BY 86 81 1620
TABLE ACCESS BY INDEX ROWID 85 81 1620
INDEX RANGE SCAN 4 81
Obviously, it is INDEX FULL SCAN DESCENDING that makes first query inefficient for big tables. But I can not really differentiate the logic of two queries by looking at them. Could anyone explain me the logical differences between two queries in human language?
Thanks in advance!
First of all, as mentioned in Alex's comment, I'm not sure that your second version is 100% guaranteed to give you the right rows -- since the "middle" block of the query does not have an explicit order by
, Oracle is under no obligation to pass the rows up to the outer query block in any specific order. However, there doesn't seem to be any particular reason that it would change the order that the rows are passed up from the innermost block, so in practice it will probably work.
And this is why Oracle chooses a different plan for the second query -- it is logically not able to apply the STOPKEY
operation to the innermost query block.
I think in the first case, the optimizer is assuming that id
values are well-distributed and, for any given value, there are likely to be some very recent transactions. Since it can see that it only needs to find the 5 most recent matches, it calculates that it appears to be more efficient to scan the rows in descending order of paydate
using an index, lookup the corresponding id and other data from the table, and stop when it's found the first 5 matches. I suspect that you would see very different performance for this query depending on the specific id value that you use -- if the id has a lot of recent activity, the rows should be found very quickly, but if it does not, the index scan may have to do a lot more work.
In the second case, I believe it's not able to apply the STOPKEY
optimization to the innermost block due to the extra layer of nesting. In that case, the index full scan would become much less attractive, since it would always need to scan the entire index. Therefore it chooses to do an index lookup on id
(I'm assuming) followed by an actual sort on the date. If the given id
value matches a small subset of rows, this is likely to be more efficient -- but if you give an id
that has lots of rows spread throughout the entire table, I would expect it to become slower, since it will have to access and sort many rows.
So, I would guess that your tests have used id
value(s) that have relatively few rows which are not very recent. If this would be a typical use case, then the second query is probably better for you (again, with the caveat that I'm not sure it is technically guaranteed to produce the correct result set). But if typical values would be more likely to have many matching rows and/or more likely to have 5 very recent rows, then the first query and plan might be better.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With