I have a query that is giving me problems and I can't understand why MySQL's query optimizer is behaving the way it is. Here is the background info:
I have 3 tables. Two are relatively small and one is large.
Table 1 (very small, 727 rows):
CREATE TABLE
ipa
(ipa_id
int(11) NOT NULL AUTO_INCREMENT,ipa_code
int(11) DEFAULT NULL,ipa_name
varchar(100) DEFAULT NULL,payorcode
varchar(2) DEFAULT NULL,compid
int(11) DEFAULT '2'
PRIMARY KEY (ipa_id
),
KEYipa_code
(ipa_code
) ) ENGINE=MyISAM
Table 2 (smallish, 59455 rows):
CREATE TABLE
assign_ipa
(assignid
int(11) NOT NULL AUTO_INCREMENT,ipa_id
int(11) NOT NULL,userid
int(11) NOT NULL,username
varchar(20) DEFAULT NULL,compid
int(11) DEFAULT NULL,PayorCode
char(10) DEFAULT NULL
PRIMARY KEY (assignid
),
UNIQUE KEYassignid
(assignid
,ipa_id
),
KEYipa_id
(ipa_id
)
) ENGINE=MyISAM
Table 3 (large, 24,711,730 rows):
CREATE TABLE
master_final
(IPA
int(11) DEFAULT NULL,MbrCt
smallint(6) DEFAULT '0',PayorCode
varchar(4) DEFAULT 'WC',
KEYidx_IPA
(IPA
)
) ENGINE=MyISAM DEFAULT
Now for the query. I'm doing a 3-way join using the first two smaller tables to essentially subset the big table on one of it's indexed values. Basically, I get a list of IDs for a user, SJOnes and query the big file for those IDs.
mysql> explain
SELECT master_final.PayorCode, sum(master_final.Mbrct) AS MbrCt
FROM master_final
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
WHERE assign_ipa.username = 'SJones'
GROUP BY master_final.PayorCode, master_final.ipa\G;
************* 1. row *************
id: 1
select_type: SIMPLE
table: master_final
type: ALL
possible_keys: idx_IPA
key: NULL
key_len: NULL
ref: NULL
rows: 24711730
Extra: Using temporary; Using filesort
************* 2. row *************
id: 1
select_type: SIMPLE
table: ipa
type: ref
possible_keys: PRIMARY,ipa_code
key: ipa_code
key_len: 5
ref: wc_test.master_final.IPA
rows: 1
Extra: Using where
************* 3. row *************
id: 1
select_type: SIMPLE
table: assign_ipa
type: ref
possible_keys: ipa_id
key: ipa_id
key_len: 4
ref: wc_test.ipa.ipa_id
rows: 37
Extra: Using where
3 rows in set (0.00 sec)
This query takes forever (like 30 minutes!). The explain statement tells me why, it's doing a full table scan on the big table even though there is a perfectly good index. It's not using it. I don't understand this. I can look at the query and see that it's only needs to query a couple of IDs from the big table. If I can do it, why can't MySQL's optimizer do it?
To illustrate, here are the IDs associated with 'SJones':
mysql> select username, ipa_id from assign_ipa where username='SJones';
+----------+--------+
| username | ipa_id |
+----------+--------+
| SJones | 688 |
| SJones | 689 |
+----------+--------+
2 rows in set (0.02 sec)
Now, I can rewrite the query substituting the ipa_id values for the username in the where clause. To me this is equivalent to the original query. MySQL sees it differently. If I do this, the optimizer makes use of the index on the big table.
mysql> explain
SELECT master_final.PayorCode, sum(master_final.Mbrct) AS MbrCt
FROM master_final
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
*WHERE assign_ipa.ipa_id in ('688','689')*
GROUP BY master_final.PayorCode, master_final.ipa\G;
************* 1. row *************
id: 1
select_type: SIMPLE
table: ipa
type: range
possible_keys: PRIMARY,ipa_code
key: PRIMARY
key_len: 4
ref: NULL
rows: 2
Extra: Using where; Using temporary; Using filesort
************* 2. row *************
id: 1
select_type: SIMPLE
table: assign_ipa
type: ref
possible_keys: ipa_id
key: ipa_id
key_len: 4
ref: wc_test.ipa.ipa_id
rows: 37
Extra: Using where
************* 3. row *************
id: 1
select_type: SIMPLE
table: master_final
type: ref
possible_keys: idx_IPA
key: idx_IPA
key_len: 5
ref: wc_test.ipa.ipa_code
rows: 34953
Extra: Using where
3 rows in set (0.00 sec)
The only thing I've changed is a where clause that doesn't even directly hit the big table. And yet, the optimizer uses the index 'idx_IPA' on the big table and the full table scan is no longer used. The query when re-written like this is very fast.
OK, that's a lot of background. Now my question. Why should the where clause matter to the optimizer? Either where clause will return the same result set from the smaller table, and yet I'm getting dramatically different results depending on which one I use. Obviously, I want to use the where clause containing the username rather than trying to pass all associated IDs to the query. As written though, this is not possible?
Thanks for sticking with me. I know its a very longish question.
To optimize update operations, we should keep the transaction size as short as possible. We need to consider the lock escalation mode of the modified table to minimize the usage of too many resources. Analyzing the execution plan may help to resolve performance bottlenecks of the update query.
Not quite sure if I'm right, but I think the following is happening here. This:
WHERE assign_ipa.username = 'SJones'
may create a temporary table, since it requires a full table scan. Temporary tables have no indexes, and they tend to slow down things down a lot.
The second case
INNER JOIN ipa ON ipa.ipa_code = master_final.IPA
INNER JOIN assign_ipa ON ipa.ipa_id = assign_ipa.ipa_id
WHERE assign_ipa.ipa_id in ('688','689')
on the other hand allows for joining of indexes, which is fast. Additionally, it can be transformed to
SELECT .... FROM master_final WHERE IDA IN (688, 689) ...
and I think MySQL is doing that, too.
Creating an index on assign_ipa.username may help.
Edit
I rethought the problem and now have a different explanation.
The reason of course is the missing index. This means that MySQL has no clue how large the result of querying assign_ipa would be (MySQL does not store counts), so it starts with the joins first, where it can relay on keys.
That's what row 2 and 3 of explain log tell us.
And after that, it tries to filter the result by assign_ipa.username, which has no key, as stated in row 1.
As soon as there is an index, it filters assign_ipa first, and joins afterwards, using the according indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With