Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql - OR operator not using index

Tags:

sql

mysql

explain

I have a simple invitation table:

CREATE TABLE `invitation` (
  `invitation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `inviter_id` int(10) unsigned NOT NULL,
  `invitee_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`invitation_id`),
  UNIQUE KEY `invitee_inviter_idx` (`invitee_id`,`inviter_id`)
)

I want to select an invitation by inviter 70 to invitee 62 and vice versa:

EXPLAIN SELECT * FROM `invitation` WHERE 
(invitee_id = 70 AND inviter_id = 62) OR (invitee_id = 62 AND inviter_id = 70)

But this query is of type ALL and doesn't use the invitee_inviter_idx. Please tell me what is wrong here ?

Thank you!

==EDIT== Sorry, i was wrong about the schema, it has one more field: request_ts. This time the query plan is ALL.

    CREATE TABLE `invitation` (
      `invitation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `inviter_id` int(10) unsigned NOT NULL,
      `invitee_id` int(10) unsigned NOT NULL,
      `request_ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, 
      PRIMARY KEY (`invitation_id`),
      UNIQUE KEY `invitee_inviter_idx` (`invitee_id`,`inviter_id`)
    )

Here is my exlain result:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  invitation  ALL invitee_inviter_idx \N  \N      \N  1   Using where
like image 765
robinmag Avatar asked Dec 12 '22 19:12

robinmag


2 Answers

There are at least 3 reasons why your select isn't using the index

1) You used a select *, which includes items not in the index (i.e., invitation_id). That means had it used the index, it would then have to look up the row in the database to fetch the invitation_id value. Had you added invitation_id to the index, it would have used the index. If you had done a select of just invitee_id, inviter_id, it would have used the index.

2) The query optimizer decided it would be better to just scan the table rather than scan a range of the index. When the optimizer is trying to decide full table scan or partial index scan, it doesn't do it for your exact query - it wants a plan that works well in general. One which may be run over an over again. Scanning from invitee_id,inviter_id (62,70) to (70,62) is likely only 8 index entries, but if picked randomly out of 50k items, the average distance would be about 17k items. So on average, a single query will access 1/3 of the index (i.e., pull it into memory), then access the page the row is on (see #1) pulling that into memory. Your rows are so small, accessing just one item likely pulls in 680 rows (8k page by 12 bytes for 3 32 bit #'s) which is 1/70th of table - do 100 queries and likely you've pulled the whole index into memory and the whole table - it makes more sense to take a little longer by scanning the table and use 40% less memory to hold bits of other tables. At some point (which seems to be 65k rows) it stops making sense.

3) What your question said: you used an OR. An OR expression can't be used to look something up in an index - that is, you can't look up 62 or 70. Instead, it produces a range looking up (62,70), then scans to get to (70,62) (see #2 why this can be bad).

You asked "what's wrong here" - it's that you used the OR, which won't scale. Not only do you need to avoid the type ALL, you need to avoid large type RANGES.

I've seen the same issue with other SQL engines and the solution I used was a UNION ALL.

Something like

SELECT * FROM `invitation` WHERE 
    (invitee_id = 70 AND inviter_id = 62)
UNION ALL
SELECT  * FROM `invitation` WHERE
    (invitee_id = 62 AND inviter_id = 70)

That will make it be done as two queries and merge the results without checking for duplicates.

This is much lighter on memory usage and much faster - Just a few pages of the index and the two pages from the table are required and O(log(N)) for each lookup. This is because it's of type const now - your goal was to eliminate the ALL, but switching to a RANGE is nearly as bad to fetch just two rows. Scanning the whole table is O(N) and scanning a RANGE of the index is also O(N) since O(1/3*N) is O(N). In other words, it doesn't scale.

like image 113
Tony Lee Avatar answered Dec 14 '22 09:12

Tony Lee


You just need to get enough rows into the table. MySQL will do a full table scan on small tables simply because it's cheap enough.

My example puts 65k rows into the table and it will use the index.

http://sqlfiddle.com/#!2/63079/1

like image 36
Andreas Wederbrand Avatar answered Dec 14 '22 09:12

Andreas Wederbrand