Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a long IN clause a code smell?

Simple question. Wondering if a long IN clause is a code smell? I don't really know how to justify it. I can't put my finger on why it smells other than that I think it does.

select
  name,
  code,
  capital,
  population,
  flower,
  bird
from us_states
where
  code in
    ('NJ', 'NY', 'PA', 'CA', 'AL', 'AK', 'AZ',
    'IL', 'IN', 'KY', 'KS', 'DC', 'MD', 'MA')

How does a database typically implement such a lookup? Is a temporary table made and joined to? Or is it just expanded into a series of logical ORs?

It feels like it should have been a join...

I'm not saying all IN clauses are bad. Sometimes you can't help it. But there are some cases (particularly the longer they get) where the set of elements you're matching against actually comes from somewhere. And shouldn't that be joined on instead?

Is it worth creating (via the application level) a temporary table that has all the elements you want to search against and then doing a real join against that?

select u.*
from us_states u

join #chosen_states t
on u.code = t.code
like image 655
Mark Canlas Avatar asked Jun 02 '11 19:06

Mark Canlas


Video Answer


2 Answers

I think it is a code smell. For one thing, databases have limits as to the number of elements allowed in an IN clause, and if your SQL is generated dynamically, you may eventually bump up against those limits.

When the list starts to become long-ish, I would convert to using a stored procedure with a temporary table, to avoid any chance of errors.

I doubt performance is a major concern though, IN clauses are very fast, as they can short-circuit, unlike NOT IN clauses.

like image 181
D'Arcy Rittich Avatar answered Oct 12 '22 23:10

D'Arcy Rittich


Is it worth creating (via the application level) a temporary table.

The problem with IN is that it doesn't use an index and the comparison (worst case: x14 here) gets repeated for every row in your source table.

Creating a temp table is a good idea, if you put an index on the join fields.
That way the query can lookup the value directly, using a BTree index that should only take 3 or 4 comparisons worst case log2(14) = 3.something
Which is much faster.

If you're smart you can even use a hash-index in which case the DB only needs to do 1 comparison, speeding your query up 3 fold compared to the btree index.

Tips for using a temp table
Make sure to use a memory table
Use a hash index as your primary key.
Try and do the inserts in one statement.

The semi-constant time you'll spend creating the temp-table will be dwarfed by the speedup because of the O(1) lookup time using the hash index.

like image 45
Johan Avatar answered Oct 12 '22 21:10

Johan