I have a set of consecutive rows I want to get based upon their primary key, which is an auto-incrementing integer. Assuming that there are no holes, is there any performance between between: <pre class="prettyprint"><code>SELECT * FROM `theTable` WHERE `id` IN (n, ... nk); </code></pre> and: <pre class="prettyprint"><code>SELECT * FROM `theTable` WHERE `id` BETWEEN n AND nk; </code></pre>

<code>BETWEEN</code> should outperform <code>IN</code> in this case (but do measure and check execution plans, too!), especially as <code>n</code> grows and as statistics are still accurate. Let's assume: <ul> <li> <code>m</code> is the size of your table</li> <li> <code>n</code> is the size of your range</li> </ul> <h3>Index can be used (<code>n</code> is tiny compared to <code>m</code>)</h3> <ul> <li>In theory, <code>BETWEEN</code> can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at most <code>n</code> index leaf nodes. The complexity will be <code>O(n + log m)</code></li> <li><code>IN</code> is usually implemented as a series (loop) of <code>n</code> "range scans" on the primary key index. With <code>m</code> being the size of the table, the complexity will always be <code>O(n * log m)</code> ... which is always worse (neglibile for very small tables <code>m</code> or very small ranges <code>n</code>)</li> </ul> <h3>Index cannot be used (<code>n</code> is a significant portion of <code>m</code>)</h3> In any case, you'll get a full table scan and evaluate the predicate on each row: <ul> <li><code>BETWEEN</code> needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity is <code>O(m)</code></li> <li><code>IN</code> needs to evaluate at most <code>n</code> predicates. The complexity is <code>O(m * n)</code> ... which is again always worse, or perhaps <code>O(m)</code> if the database can optimise the <code>IN</code> list to be a hashmap, rather than a list of predicates.</li> </ul>

<code>a between b and c</code> is a macro that expands to <code>b <= a and a <= c</code>. <code>a in (b,c,d)</code> is a macro that expands to <code>a=b or a=c or a=d</code>. Assuming your <code>n</code> and <code>nk</code> are integer, both should end up meaning the same. The <code>between</code> variant should be much faster because it's only two compares, versus <code>nk - n</code> compares for the <code>in</code> variant.

Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?

Tags:

I have a set of consecutive rows I want to get based upon their primary key, which is an auto-incrementing integer. Assuming that there are no holes, is there any performance between between:

SELECT * FROM `theTable` WHERE `id` IN (n, ... nk);

and:

SELECT * FROM `theTable` WHERE `id` BETWEEN n AND nk;

446

asked Jul 22 '10 11:07

pr1001

2 Answers

BETWEEN should outperform IN in this case (but do measure and check execution plans, too!), especially as n grows and as statistics are still accurate. Let's assume:

m is the size of your table
n is the size of your range

Index can be used (`n` is tiny compared to `m`)

In theory, BETWEEN can be implemented with a single "range scan" (Oracle speak) on the primary key index, and then traverse at most n index leaf nodes. The complexity will be O(n + log m)
IN is usually implemented as a series (loop) of n "range scans" on the primary key index. With m being the size of the table, the complexity will always be O(n * log m) ... which is always worse (neglibile for very small tables m or very small ranges n)

Index cannot be used (`n` is a significant portion of `m`)

In any case, you'll get a full table scan and evaluate the predicate on each row:

BETWEEN needs to evaluate two predicates: One for the lower and one for the upper bound. The complexity is O(m)
IN needs to evaluate at most n predicates. The complexity is O(m * n) ... which is again always worse, or perhaps O(m) if the database can optimise the IN list to be a hashmap, rather than a list of predicates.

164

answered Jan 02 '23 22:01

Lukas Eder

a between b and c is a macro that expands to b <= a and a <= c.

a in (b,c,d) is a macro that expands to a=b or a=c or a=d.

Assuming your n and nk are integer, both should end up meaning the same. The between variant should be much faster because it's only two compares, versus nk - n compares for the in variant.

answered Jan 03 '23 00:01

Andomar

Related questions
                            
                                Why are 8 and 256 such important numbers in computer sciences?
                            
                                check NaN number [duplicate]
                            
                                lib curl in c++ disable printing
                            
                                Specifying formula in R with glm without explicit declaration of each covariate
                            
                                Why union can't be used in Inheritance?
                            
                                How to know the preferred display width (in columns) of Unicode characters?
                            
                                How to cache results in scala?
                            
                                How can I select adjacent rows to an arbitrary row (in sql or postgresql)?
                            
                                Publishing a WS with Jax-WS Endpoint
                            
                                C# - Duration between two DateTimes in minutes
                            
                                How to do exponentiation in Bash
                            
                                How to use AsyncTask to show a ProgressDialog while doing background work in Android? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?

Tags:

pr1001

People also ask

2 Answers

Index can be used (`n` is tiny compared to `m`)

Index cannot be used (`n` is a significant portion of `m`)

Lukas Eder

Andomar

Recent Activity

Donate For Us

Is there a performance difference between BETWEEN and IN with MySQL or in SQL in general?

Tags:

pr1001

People also ask

2 Answers

Index can be used (n is tiny compared to m)

Index cannot be used (n is a significant portion of m)

Lukas Eder

Andomar

Related questions

Recent Activity

Donate For Us

Index can be used (`n` is tiny compared to `m`)

Index cannot be used (`n` is a significant portion of `m`)