Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Views OR vs IN clause

Tags:

mysql

views

MySQL Server version: 5.1.41 On Ubuntu 10.04

I came across a difference in MySQL's behavior when modifying some queries and wanted to know the reason for it.

Basically I am creating a view. When I query the view, the result set is the same However, the number of rows read are different for an IN clause than for an OR clause. Below below is a simple example :

CREATE TABLE country ( 
    id_country int(11) NOT NULL AUTO_INCREMENT, 
    name varchar(50) NOT NULL, 
    PRIMARY KEY (id_country) 
) ENGINE=InnoDB; 

INSERT INTO country (name) VALUES ('A'), ('B'), ('C'), ('D'), ('E'), ('F'), ('G'), ('H'); 

CREATE TABLE status ( 
    id_status int(11) NOT NULL AUTO_INCREMENT, 
    id_country int(11) NOT NULL, 
    status tinyint(4) NOT NULL, 
    PRIMARY KEY (id_status) 
) ENGINE=InnoDB; 
ALTER TABLE status ADD INDEX ( id_country ); 
ALTER TABLE status ADD FOREIGN KEY ( id_country ) REFERENCES test.country (id_country) ON DELETE RESTRICT ON UPDATE RESTRICT ; 

INSERT INTO status(id_country, status) VALUES 
(1,0), (2,1), (3,0), (4,1), (5,0),(6,1), (7,0), (8,1); 

CREATE ALGORITHM=MERGE VIEW view_country 
AS 
    SELECT c.*, s.id_status, s.status 
    FROM country c JOIN status s ON c.id_country = s.id_country; 

The 2 explain statements below show different number of rows parsed

mysql> EXPLAIN EXTENDED  SELECT * FROM view_country WHERE id_country IN (1, 2, 3)\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: ref
possible_keys: id_country
          key: id_country
      key_len: 4
          ref: test.c.id_country
         rows: 1
     filtered: 100.00
        Extra: 
2 rows in set, 1 warning (0.00 sec)

Using the OR Clause

mysql> EXPLAIN EXTENDED SELECT * FROM view_country WHERE id_country = 1 OR id_country = 2 OR id_country = 3\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: s
         type: ALL
possible_keys: id_country
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 8
     filtered: 37.50
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: test.s.id_country
         rows: 1
     filtered: 100.00
        Extra: 
2 rows in set, 1 warning (0.00 sec)

If you look at the "rows" in both the queries - they add up differently

The query with OR clause reads less rows compared to IN, which adds up for huge tables and joins.

Can someone help me understand why this is so ?

Thank you for your time.

like image 678
itz_nsn Avatar asked Nov 14 '22 09:11

itz_nsn


1 Answers

Please note that execution plans have a lot to do with the state of your indexes and the size of your tables. MySQL may execute differently even for similar queries, and sometimes MySQL can even guess wrong.

The view with the JOIN definitely complicates things, so your SELECT statement is not so simple. Don't be surprised that MySQL chooses a different execution plan for IN versus OR.

In the case of the first query, MySQL has chosen to use indexes for both queries, which results in the specific and accurate row count in the EXPLAIN.

However, in the second query, MySQL has chosen to scan all the rows in the status table. This makes sense since there are so few rows and MySQL has to visit the table anyway because there is no covering index that will return all needed rows. I wouldn't be surprised if the second query isn't actually faster than the first. Also, please note that the row count (for scans) in EXPLAIN are estimates, so take that into consideration when profiling your queries.

The first query has to do 6 lookups, whereas the second query only has to do 3 lookups after a very short table scan.

There are many tricks that MySQL does that are sometimes limited to very specific scenarios to try to optimize your query, based on the current indexes and row counts. There are documented cases where, for similar queries, MySQL will take two different approaches and end up with the same execution path. There are other cases where two completely different execution plans result in similar performance, and this is one of those cases.

Anyway, I hope this explains to you why there is a difference, but as long as the results are the same, and the performance is similar, there's nothing to be concerned about.

In some cases, as I said earlier, MySQL will just not make the best guess, and then you can use tools like index hints and natural joins. In your case, I think MySQL is behaving just fine.

To research performance and execution plans more check out the two following sites:

  • http://www.mysqlperformanceblog.com/
  • http://explainextended.com/
like image 189
Marcus Adams Avatar answered Nov 17 '22 05:11

Marcus Adams