Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's wrong with this query? EXPLAIN looks fine to me

I'm going through an application and trying to optimize some queries and I'm really struggling with a few of them. Here's an example:

SELECT `Item` . * , `Source` . * , `Keyword` . * , `Author` . *
FROM `items` AS `Item`
JOIN `sources` AS `Source` ON ( `Item`.`source_id` = `Source`.`id` )
JOIN `authors` AS `Author` ON ( `Item`.`author_id` = `Author`.`id` )
JOIN `items_keywords` AS `ItemsKeyword` ON ( `Item`.`id` = `ItemsKeyword`.`item_id` )
JOIN `keywords` AS `Keyword` ON ( `Keyword`.`id` = `ItemsKeyword`.`keyword_id` )
JOIN `keywords_profiles` AS `KeywordsProfile` ON ( `Keyword`.`id` = `KeywordsProfile`.`keyword_id` )
JOIN `profiles` AS `Profile` ON ( `Profile`.`id` = `KeywordsProfile`.`profile_id` )
WHERE `KeywordsProfile`.`profile_id` IN ( 17 )
GROUP BY `Item`.`id`
ORDER BY `Item`.`timestamp` DESC , `Item`.`id` DESC
LIMIT 0 , 20;

This one is taking 10-30 seconds...in the tables referenced, there are about 500k author rows, and about 750k items and items_keywords rows. Everything else is less than 500 rows.

Here's the explain output: http://img.skitch.com/20090220-fb52wd7jf58x41ikfxaws96xjn.jpg

EXPLAIN is relatively new to me, but I went through this line by line and it all seems fine. Not sure what else I can do, as I've got indexes on everything...what am I missing?

The server this sits on is just a 256 slice over at slicehost, but there's nothing else running on it and the CPU is at 0% before its run. And yet still it cranks away on this query. Any ideas?

EDIT: Some further info; one of the things that makes this really frustrating is that if I repeatedly run this query, it takes less than .1 seconds. I'm assuming this is due to the query cache, but if I run RESET QUERY CACHE before it, it still runs extremely quickly. It's only after I wait a little while or run some other queries that the 10-30 second times return. All the tables are MyISAM...does this indicate that MySQL is loading stuff into memory and that's why it runs so much faster for awhile?

EDIT 2: Thanks so much to everyone for your help...an update...I cut everything down to this:

SELECT i.id
FROM items AS i
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;

Consistently took 5-6 seconds, despite there only being 750k records in the DB. Once I dropped the 2nd column on the ORDER BY clause, it was pretty much instant. There's obviously several things going on here, but when I cut the query down to this:

SELECT i.id
FROM items AS i
JOIN items_keywords AS ik ON ( i.id = ik.item_id )
JOIN keywords AS k ON ( k.id = ik.keyword_id )
JOIN keywords_profiles AS kp ON ( k.id = kp.keyword_id )
WHERE kp.profile_id IN (139)
ORDER BY i.timestamp DESC
LIMIT 20;

It's still taking 10+ seconds...what else can I do?

Minor curiosity: on the explain, the rows column for items_keywords is always 1544, regardless of what profile_id I'm using in the query. shouldn't it change depending on the number of items associated with that profile?

EDIT 3: Ok, this is getting ridiculous :). If I drop the ORDER BY clause entirely, things are very speedy and the temp table / filesort disappears from explain. There's currently an index on the item.timestamp column, but is it not being used for some reason? I thought I remembered something about mysql only using one index per table or something? should I create a multi-column index over all the columns on the items table that this query references (source_id, author_id, timestamp, etc)?

like image 285
Ryan Waggoner Avatar asked Feb 20 '09 01:02

Ryan Waggoner


3 Answers

Try this and see how it does:

SELECT i.*, s.*, k.*, a.*
FROM items AS i
 JOIN sources AS s ON (i.source_id = s.id)
 JOIN authors AS a ON (i.author_id = a.id)
 JOIN items_keywords AS ik ON (i.id = ik.item_id)
 JOIN keywords AS k ON (k.id = ik.keyword_id)
WHERE k.id IN (SELECT kp.keyword_id
           FROM keywords_profiles AS kp
           WHERE kp.profile_id IN (17))
ORDER BY i.timestamp DESC, i.id DESC
LIMIT 0, 20;

I factored out a couple of the joins into a non-correlated subquery, so you wouldn't have to do a GROUP BY to map the result to distinct rows.

Actually, you may still get multiple rows per i.id in my example, depending on how many keywords map to a given item and also to profile_id 17.

The filesort reported in your EXPLAIN report is probably due to the combination of GROUP BY and ORDER BY using different fields.

I agree with @ʞɔıu's answer that the speedup is probably because of key caching.

like image 180
Bill Karwin Avatar answered Nov 12 '22 01:11

Bill Karwin


It looks okay, every row in the explain is using an index of some sort. One possible worry is the filesort bit. Try running the query without the order by clause and see if that improves it.

Then, what I would do is gradually take out each join until you (hopefully) get that massive speed increase, then concentrate on why that's happening.

The reason I mention the filesort is because I can't see a mention of timestamp anywhere in the explain output (even though it's your primary sort criteria) - it might be requiring a full non-indexed sort.

UPDATE#1:

Based on edit#2, the query:

SELECT i.id
    FROM items AS i
    ORDER BY i.timestamp DESC, i.id DESC
    LIMIT 0, 20;

takes 5-6 seconds. That's abhorrent. Try creating a composite index on both TIMESTAMP and ID and see if that improves it:

create index timestamp_id on items(timestamp,id);
select id from items order by timestamp desc,id desc limit 0,20;
select id from items order by timestamp,id limit 0,20;
select id from items order by timestamp desc,id desc;
select id from items order by timestamp,id;

On one of the tests, I've left off the descending bit (DB2 for one sometimes doesn't use indexes if they're in the opposite order). The other variation is to take off the limit in case that's affecting it.

like image 33
paxdiablo Avatar answered Nov 12 '22 01:11

paxdiablo


For your query to run fast, you need:

  1. Create an index: CREATE INDEX ix_timestamp_id ON items (timestamp, id)

    • Ensure that id's on sources, authors and keywords are primary keys.
    • Force MySQL to use this index for items, and perform NESTED LOOP joins for other items:

      EXPLAIN EXTENDED SELECT Item.*, Source . * , Keyword . * , Author . * FROM items AS Item FORCE INDEX FOR ORDER BY (ix_timestamp_id) JOIN items_keywords AS ItemsKeyword FORCE INDEX (ix_item_keyword) ON ( Item.id = ItemsKeyword.item_id AND ItemsKeyword.keyword_id IN ( SELECT keyword_id FROM keywords_profiles AS KeywordsProfile FORCE INDEX (ix_keyword_profile) WHERE KeywordsProfile.profile_id = 17 ) ) JOIN sources AS Source FORCE INDEX (primary) ON ( Item.source_id = Source.id ) JOIN authors AS Author FORCE INDEX (primary) ON ( Item.author_id = Author.id ) JOIN keywords AS Keyword FORCE INDEX (primary) ON ( Keyword.id = ItemsKeyword.keyword_id ) ORDER BY Item.timestamp DESC, Item.id DESC LIMIT 0, 20

As you can see, we get rid of GROUP BY, push the subquery into the JOIN condition and force PRIMARY KEYs to be used for joins.

That's how we ensure that NESTED LOOPS with items as a leading tables will be used for all joins.

As a result:

1, 'PRIMARY', 'Item',         'index',  '', 'ix_timestamp_id', '12', '', 20, 2622845.00, ''
1, 'PRIMARY', 'Author',       'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.author_id', 1, 100.00, ''
1, 'PRIMARY', 'Source',       'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.source_id', 1, 100.00, ''
1, 'PRIMARY', 'ItemsKeyword', 'ref', 'PRIMARY', 'PRIMARY', '4', 'test.Item.id', 1, 100.00, 'Using where; Using index'
1, 'PRIMARY', 'Keyword',      'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.ItemsKeyword.keyword_id', 1, 100.00, ''
2, 'DEPENDENT SUBQUERY',      'KeywordsProfile', 'unique_subquery', 'PRIMARY', 'PRIMARY', '8', 'func,const', 1, 100.00, 'Using index; Using where'

, and when we run this, we get

20 rows fetched in 0,0038s (0,0019s)

There are 500k in items, 600k in items_keywords, 512 values in keywords and 512 values in keywords_profiles (all with profile 17).

like image 1
Quassnoi Avatar answered Nov 12 '22 01:11

Quassnoi