I have a table Assets
on an InnoDB engine defined as:
CREATE TABLE Assets (
qid SMALLINT(5) NOT NULL,
sid BIGINT(20) NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (sid,qid),
KEY sid (sid)
);
I'm running the following query:
SELECT COUNT(*) FROM Assets WHERE sid>10000;
On my machine, this query takes about 30 seconds with 2 millions entries in the table. Now, if I modify the query to use indices, the results vary widely:
SELECT COUNT(*) FROM Assets USE INDEX(<index>) WHERE sid>10000;
NO INDEX
: No explicit USE INDEX
, i.e., the first SELECT
query : 30 secs
KEY sid (sid)
: 1.5 secs
KEY cid (sid,qid)
: 1.5 secs
PRIMARY
: I used USE INDEX(PRIMARY)
within the query. : 30 secs
So these are my questions:
I thought a query would automatically use the primary key as its index, based on this. And yet there's a major difference between USE INDEX (cid)
and NO INDEX
. What's the difference? Also, how I do explicitly give the primary key as the index?
If NO INDEX
doesn't actually use the primary key as an index, what does USE INDEX(PRIMARY)
do that causes it to have the same running time as NO INDEX
?
Is there a difference (not just performance wise) between USE INDEX(sid)
and USE INDEX(cid)
in a query that only filters by sid
?
Forgive the long post, but I wanted to let it be open to discussion.
Ok, here's what I have found out so far:
First of all, I'm told the key setup should be either: PRIMARY KEY(qid,sid), KEY(sid)
or PRIMARY KEY(sid,qid), KEY(qid)
. I don't really understand the difference. If someone does, please let me know.
Secondly, the KEY sid
(sid
) references far fewer index pages than a larger key, so it tends to be faster. As for the difference between using the PRIMARY KEY as the index and a proper KEY (even if they use the same fields), I've been told it's something like this:
Primary keys index the entire table data with the fields of the primary key. That means that the PRIMARY KEY and the data are stored together. So a query using the PRIMARY KEY would have to go through the entire table data, which even indexed would bog down on large uncacheable tables.
With discrete keys, the number of rows may be the same, but a much smaller index (consisting of the indicated fields) is scanned, which hits a smaller number of disk blocks, and hence runs much faster. I'm assuming this is also the reason for the difference in using USE INDEX(cid)
and using the primary key as index, both of which have the same fields.
From my experience, having an index which is a subset of another one tends to slow things down.. but you mileage may vary, since you have to consider a lot of things when dealing with indexes..
For instance, if you read often and rarely change data, then having many indexes might help you more; if your operations involve a lot of insert/update/delete then having too many indexes might slow you down quite a bit.
If your primary key is (sid, qid) then I see no fit on having another key (sid) which the engine can probably retrieve as a prefix of the PK. I'd rather add an index on qid if I'm going do leverage it - that is if I have some queries filtering or sorting on that field or if I have some JOINs on that field..
As of the order of the fields on the primary key, I usually try to determine how they will be used in my queries: if all my queries make use of sid and some use both sid and qid, then choose (sid,qid); if they all use qid and only some also use sid, then choose (qid, sid); if they happen to use either sid or qid, then have a PK (sid, qid) and another key (qid), so that queries using both fields will be using your PK, the same will happen for queries using just sid, and finally those using just qid will be using the (qid) key.
I'm just a bit confused about the use(primary) forcing mysql not to use indexes, but that might be something (a bug?) related to your mysql version..
Here you can find some hints on index hints: http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
In general try not to poke too much with index hints, the optimizer usually does a good job! if it doesn't, probably there is a flaw somewhere or it simply considered that a table scan is faster because indexes are not selective enough.
Furthermore, you might sometimes be in need of a table optimization to refresh index statistics.. but since you're using InnoDB it is probably not the case...
HTH
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With