MySQL Server version: 5.1.41 On Ubuntu 10.04
I came across a difference in MySQL's behavior when modifying some queries and wanted to know the reason for it.
Basically I am creating a view. When I query the view, the result set is the same
However, the number of rows read are different for an IN
clause than for an OR
clause. Below below is a simple example :
CREATE TABLE country (
id_country int(11) NOT NULL AUTO_INCREMENT,
name varchar(50) NOT NULL,
PRIMARY KEY (id_country)
) ENGINE=InnoDB;
INSERT INTO country (name) VALUES ('A'), ('B'), ('C'), ('D'), ('E'), ('F'), ('G'), ('H');
CREATE TABLE status (
id_status int(11) NOT NULL AUTO_INCREMENT,
id_country int(11) NOT NULL,
status tinyint(4) NOT NULL,
PRIMARY KEY (id_status)
) ENGINE=InnoDB;
ALTER TABLE status ADD INDEX ( id_country );
ALTER TABLE status ADD FOREIGN KEY ( id_country ) REFERENCES test.country (id_country) ON DELETE RESTRICT ON UPDATE RESTRICT ;
INSERT INTO status(id_country, status) VALUES
(1,0), (2,1), (3,0), (4,1), (5,0),(6,1), (7,0), (8,1);
CREATE ALGORITHM=MERGE VIEW view_country
AS
SELECT c.*, s.id_status, s.status
FROM country c JOIN status s ON c.id_country = s.id_country;
The 2 explain statements below show different number of rows parsed
mysql> EXPLAIN EXTENDED SELECT * FROM view_country WHERE id_country IN (1, 2, 3)\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: c
type: range
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: NULL
rows: 3
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: s
type: ref
possible_keys: id_country
key: id_country
key_len: 4
ref: test.c.id_country
rows: 1
filtered: 100.00
Extra:
2 rows in set, 1 warning (0.00 sec)
Using the OR Clause
mysql> EXPLAIN EXTENDED SELECT * FROM view_country WHERE id_country = 1 OR id_country = 2 OR id_country = 3\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: s
type: ALL
possible_keys: id_country
key: NULL
key_len: NULL
ref: NULL
rows: 8
filtered: 37.50
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.s.id_country
rows: 1
filtered: 100.00
Extra:
2 rows in set, 1 warning (0.00 sec)
If you look at the "rows" in both the queries - they add up differently
The query with OR
clause reads less rows compared to IN
, which adds up for huge tables and joins.
Can someone help me understand why this is so ?
Thank you for your time.
Please note that execution plans have a lot to do with the state of your indexes and the size of your tables. MySQL may execute differently even for similar queries, and sometimes MySQL can even guess wrong.
The view with the JOIN definitely complicates things, so your SELECT statement is not so simple. Don't be surprised that MySQL chooses a different execution plan for IN versus OR.
In the case of the first query, MySQL has chosen to use indexes for both queries, which results in the specific and accurate row count in the EXPLAIN.
However, in the second query, MySQL has chosen to scan all the rows in the status table. This makes sense since there are so few rows and MySQL has to visit the table anyway because there is no covering index that will return all needed rows. I wouldn't be surprised if the second query isn't actually faster than the first. Also, please note that the row count (for scans) in EXPLAIN are estimates, so take that into consideration when profiling your queries.
The first query has to do 6 lookups, whereas the second query only has to do 3 lookups after a very short table scan.
There are many tricks that MySQL does that are sometimes limited to very specific scenarios to try to optimize your query, based on the current indexes and row counts. There are documented cases where, for similar queries, MySQL will take two different approaches and end up with the same execution path. There are other cases where two completely different execution plans result in similar performance, and this is one of those cases.
Anyway, I hope this explains to you why there is a difference, but as long as the results are the same, and the performance is similar, there's nothing to be concerned about.
In some cases, as I said earlier, MySQL will just not make the best guess, and then you can use tools like index hints and natural joins. In your case, I think MySQL is behaving just fine.
To research performance and execution plans more check out the two following sites:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With