Simple question. Wondering if a long IN clause is a code smell? I don't really know how to justify it. I can't put my finger on why it smells other than that I think it does.
select
name,
code,
capital,
population,
flower,
bird
from us_states
where
code in
('NJ', 'NY', 'PA', 'CA', 'AL', 'AK', 'AZ',
'IL', 'IN', 'KY', 'KS', 'DC', 'MD', 'MA')
How does a database typically implement such a lookup? Is a temporary table made and joined to? Or is it just expanded into a series of logical ORs?
It feels like it should have been a join...
I'm not saying all IN clauses are bad. Sometimes you can't help it. But there are some cases (particularly the longer they get) where the set of elements you're matching against actually comes from somewhere. And shouldn't that be joined on instead?
Is it worth creating (via the application level) a temporary table that has all the elements you want to search against and then doing a real join against that?
select u.*
from us_states u
join #chosen_states t
on u.code = t.code
I think it is a code smell. For one thing, databases have limits as to the number of elements allowed in an IN
clause, and if your SQL is generated dynamically, you may eventually bump up against those limits.
When the list starts to become long-ish, I would convert to using a stored procedure with a temporary table, to avoid any chance of errors.
I doubt performance is a major concern though, IN
clauses are very fast, as they can short-circuit, unlike NOT IN
clauses.
Is it worth creating (via the application level) a temporary table.
The problem with IN
is that it doesn't use an index and the comparison (worst case: x14 here) gets repeated for every row in your source table.
Creating a temp table is a good idea, if you put an index on the join fields.
That way the query can lookup the value directly, using a BTree index that should only take 3 or 4 comparisons worst case log2(14) = 3.something
Which is much faster.
If you're smart you can even use a hash-index
in which case the DB only needs to do 1 comparison, speeding your query up 3 fold compared to the btree index.
Tips for using a temp table
Make sure to use a memory table
Use a hash index
as your primary key.
Try and do the inserts in one statement.
The semi-constant time you'll spend creating the temp-table will be dwarfed by the speedup because of the O(1) lookup time using the hash index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With