I'm going through an application and trying to optimize some queries and I'm really struggling with a few of them. Here's an example:
SELECT `Item` . * , `Source` . * , `Keyword` . * , `Author` . * FROM `items` AS `Item` JOIN `sources` AS `Source` ON ( `Item`.`source_id` = `Source`.`id` ) JOIN `authors` AS `Author` ON ( `Item`.`author_id` = `Author`.`id` ) JOIN `items_keywords` AS `ItemsKeyword` ON ( `Item`.`id` = `ItemsKeyword`.`item_id` ) JOIN `keywords` AS `Keyword` ON ( `Keyword`.`id` = `ItemsKeyword`.`keyword_id` ) JOIN `keywords_profiles` AS `KeywordsProfile` ON ( `Keyword`.`id` = `KeywordsProfile`.`keyword_id` ) JOIN `profiles` AS `Profile` ON ( `Profile`.`id` = `KeywordsProfile`.`profile_id` ) WHERE `KeywordsProfile`.`profile_id` IN ( 17 ) GROUP BY `Item`.`id` ORDER BY `Item`.`timestamp` DESC , `Item`.`id` DESC LIMIT 0 , 20;
This one is taking 10-30 seconds...in the tables referenced, there are about 500k author rows, and about 750k items and items_keywords rows. Everything else is less than 500 rows.
Here's the explain output: http://img.skitch.com/20090220-fb52wd7jf58x41ikfxaws96xjn.jpg
EXPLAIN is relatively new to me, but I went through this line by line and it all seems fine. Not sure what else I can do, as I've got indexes on everything...what am I missing?
The server this sits on is just a 256 slice over at slicehost, but there's nothing else running on it and the CPU is at 0% before its run. And yet still it cranks away on this query. Any ideas?
EDIT: Some further info; one of the things that makes this really frustrating is that if I repeatedly run this query, it takes less than .1 seconds. I'm assuming this is due to the query cache, but if I run RESET QUERY CACHE before it, it still runs extremely quickly. It's only after I wait a little while or run some other queries that the 10-30 second times return. All the tables are MyISAM...does this indicate that MySQL is loading stuff into memory and that's why it runs so much faster for awhile?
EDIT 2: Thanks so much to everyone for your help...an update...I cut everything down to this:
SELECT i.id FROM items AS i ORDER BY i.timestamp DESC, i.id DESC LIMIT 0, 20;
Consistently took 5-6 seconds, despite there only being 750k records in the DB. Once I dropped the 2nd column on the ORDER BY clause, it was pretty much instant. There's obviously several things going on here, but when I cut the query down to this:
SELECT i.id FROM items AS i JOIN items_keywords AS ik ON ( i.id = ik.item_id ) JOIN keywords AS k ON ( k.id = ik.keyword_id ) JOIN keywords_profiles AS kp ON ( k.id = kp.keyword_id ) WHERE kp.profile_id IN (139) ORDER BY i.timestamp DESC LIMIT 20;
It's still taking 10+ seconds...what else can I do?
Minor curiosity: on the explain, the rows column for items_keywords is always 1544, regardless of what profile_id I'm using in the query. shouldn't it change depending on the number of items associated with that profile?
EDIT 3: Ok, this is getting ridiculous :). If I drop the ORDER BY clause entirely, things are very speedy and the temp table / filesort disappears from explain. There's currently an index on the item.timestamp column, but is it not being used for some reason? I thought I remembered something about mysql only using one index per table or something? should I create a multi-column index over all the columns on the items table that this query references (source_id, author_id, timestamp, etc)?
Try this and see how it does:
SELECT i.*, s.*, k.*, a.*
FROM items AS i
JOIN sources AS s ON (i.source_id = s.id)
JOIN authors AS a ON (i.author_id = a.id)
JOIN items_keywords AS ik ON (i.id = ik.item_id)
JOIN keywords AS k ON (k.id = ik.keyword_id)
WHERE k.id IN (SELECT kp.keyword_id
FROM keywords_profiles AS kp
WHERE kp.profile_id IN (17))
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;
I factored out a couple of the joins into a non-correlated subquery, so you wouldn't have to do a GROUP BY
to map the result to distinct rows.
Actually, you may still get multiple rows per i.id
in my example, depending on how many keywords map to a given item and also to profile_id
17.
The filesort
reported in your EXPLAIN report is probably due to the combination of GROUP BY
and ORDER BY
using different fields.
I agree with @ʞɔıu's answer that the speedup is probably because of key caching.
It looks okay, every row in the explain
is using an index of some sort. One possible worry is the filesort
bit. Try running the query without the order by
clause and see if that improves it.
Then, what I would do is gradually take out each join
until you (hopefully) get that massive speed increase, then concentrate on why that's happening.
The reason I mention the filesort
is because I can't see a mention of timestamp anywhere in the explain
output (even though it's your primary sort criteria) - it might be requiring a full non-indexed sort.
UPDATE#1:
Based on edit#2, the query:
SELECT i.id
FROM items AS i
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;
takes 5-6 seconds. That's abhorrent. Try creating a composite index on both TIMESTAMP
and ID
and see if that improves it:
create index timestamp_id on items(timestamp,id);
select id from items order by timestamp desc,id desc limit 0,20;
select id from items order by timestamp,id limit 0,20;
select id from items order by timestamp desc,id desc;
select id from items order by timestamp,id;
On one of the tests, I've left off the descending bit (DB2 for one sometimes doesn't use indexes if they're in the opposite order). The other variation is to take off the limit in case that's affecting it.
For your query to run fast, you need:
Create an index: CREATE INDEX ix_timestamp_id ON items (timestamp, id)
id
's on sources
, authors
and keywords
are primary keys.Force MySQL to use this index for items, and perform NESTED LOOP
joins for other items:
EXPLAIN EXTENDED
SELECT Item
.*, Source
. * , Keyword
. * , Author
. *
FROM items
AS Item
FORCE INDEX FOR ORDER BY (ix_timestamp_id)
JOIN items_keywords
AS ItemsKeyword
FORCE INDEX (ix_item_keyword) ON ( Item
.id
= ItemsKeyword
.item_id
AND ItemsKeyword
.keyword_id
IN
(
SELECT keyword_id
FROM keywords_profiles
AS KeywordsProfile
FORCE INDEX (ix_keyword_profile)
WHERE KeywordsProfile
.profile_id
= 17
)
)
JOIN sources
AS Source
FORCE INDEX (primary) ON ( Item
.source_id
= Source
.id
)
JOIN authors
AS Author
FORCE INDEX (primary) ON ( Item
.author_id
= Author
.id
)
JOIN keywords
AS Keyword
FORCE INDEX (primary) ON ( Keyword
.id
= ItemsKeyword
.keyword_id
)
ORDER BY Item
.timestamp DESC, Item
.id DESC
LIMIT 0, 20
As you can see, we get rid of GROUP BY
, push the subquery into the JOIN
condition and force PRIMARY KEY
s to be used for joins.
That's how we ensure that NESTED LOOPS
with items
as a leading tables will be used for all joins.
As a result:
1, 'PRIMARY', 'Item', 'index', '', 'ix_timestamp_id', '12', '', 20, 2622845.00, '' 1, 'PRIMARY', 'Author', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.author_id', 1, 100.00, '' 1, 'PRIMARY', 'Source', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.source_id', 1, 100.00, '' 1, 'PRIMARY', 'ItemsKeyword', 'ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.id', 1, 100.00, 'Using where; Using index' 1, 'PRIMARY', 'Keyword', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.ItemsKeyword.keyword_id', 1, 100.00, '' 2, 'DEPENDENT SUBQUERY', 'KeywordsProfile', 'unique_subquery', 'PRIMARY', 'PRIMARY', '8', 'func,const', 1, 100.00, 'Using index; Using where'
, and when we run this, we get
20 rows fetched in 0,0038s (0,0019s)
There are 500k
in items
, 600k
in items_keywords
, 512
values in keywords
and 512
values in keywords_profiles
(all with profile 17
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With