Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does greater-than versus equals make a difference in MySQL SELECT?

Tags:

mysql

I have a large MyISAM table. It's approaching 1 million rows. It's basically a list of items and some information about them.

There are two indices:

  • primary: the item ID
  • date (date) and col (int).

I run two queries:

SELECT * FROM table WHERE date = '2011-02-01' AND col < 5 LIMIT 10

SELECT * FROM table WHERE date < '2011-02-01' AND col < 5 LIMIT 10

The first one finishes in ~0.0005 seconds and the second in ~0.05 seconds. That is 100X difference. Is it wrong for me to expect both of these to run at roughly the same speed? I must not be understanding the indices very well. How can I speed up the second query?

like image 570
burger Avatar asked Feb 04 '11 03:02

burger


2 Answers

Regardless of Mysql it boils down to basic algorithm theory.

Greater than and Less than operations on a large set are slower than Identity operations. With a large data set an ideal data structure for determining less than or greater is a self balancing tree (binary or n-tree). On a a self balanced tree the worst case scenario to find all less/greater is log n.

The ideal data structure for identity lookup is a hashtable. The performance of hashtables is generally O(1) aka fixed time. A hashtable however is not good for greater/less.

Generally a well balanced tree is only slightly less performing than a hashtable (which is how Haskell gets away with using a tree for hashtables).

Thus irregardless of what Mysql does its not surprise that <,> is slower than =

Old Answer below:

Because the first one is like Hashtable lookup since its '=' (particularly if your index is a hashtable) it will be faster than the second one which might work better with a tree like index.

Since MySql allows to configure the index format you can try changing that but I'm rather sure the first will always run faster than the second.

like image 121
Adam Gent Avatar answered Nov 03 '22 01:11

Adam Gent


I'm assuming you have an index on the date column. The first query uses the index, the second query probably does a linear scan (at least over part of the data). A direct fetch is always faster than a linear scan.

like image 23
user183037 Avatar answered Nov 03 '22 01:11

user183037