I have a set of consecutive rows I want to get based upon their primary key, which is an auto-incrementing integer. Assuming that there are no holes, is there any performance between between:
SELECT * FROM `theTable` WHERE `id` IN (n, ... nk);
and:
SELECT * FROM `theTable` WHERE `id` BETWEEN n AND nk;
Between is faster due to lesser comparisons. With IN clause each elements are traversed every time. But purpose of both are different: Between is used when you are comparing with Range of values in some kind of sequence.
Both of these operators are used to find out the multiple values from the table. Differences between these operator is that the BETWEEN operator is used to select a range of data between two values while The IN operator allows you to specify multiple values.
It was concluded that SQL Server offers better performance than MySQL in terms of response time. Except for the INSERT queries, SQL Server consistently took lesser time for all the other test cases as against MySQL. In terms of scaling up, MySQL showed two times increase in time when the number of rows went up.
From a maintainability perspective, BETWEEN is probably better.
BETWEEN
should outperform IN
in this case (but do measure and check execution plans, too!), especially as n
grows and as statistics are still accurate. Let's assume:
m
is the size of your tablen
is the size of your rangen
is tiny compared to m
)In theory, BETWEEN
can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at most n
index leaf nodes. The complexity will be O(n + log m)
IN
is usually implemented as a series (loop) of n
"range scans" on the primary key index. With m
being the size of the table, the complexity will always be O(n * log m)
... which is always worse (neglibile for very small tables m
or very small ranges n
)
n
is a significant portion of m
)In any case, you'll get a full table scan and evaluate the predicate on each row:
BETWEEN
needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity is O(m)
IN
needs to evaluate at most n
predicates. The complexity is O(m * n)
... which is again always worse, or perhaps O(m)
if the database can optimise the IN
list to be a hashmap, rather than a list of predicates.
a between b and c
is a macro that expands to b <= a and a <= c
.
a in (b,c,d)
is a macro that expands to a=b or a=c or a=d
.
Assuming your n
and nk
are integer, both should end up meaning the same. The between
variant should be much faster because it's only two compares, versus nk - n
compares for the in
variant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With