Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing ORDER BY Query on large table MySQL

I have a browse category query that im trying to optimize. Im ending up with Using temporary; Using filesort in the explain and the query is slow on a category with say 60,000 rows. If i remove the Order By clauses the query runs very fast .05 seconds to do 60,000 rows. With the Order By clauses its very slow around 5 seconds. Parts contains some 500,000 rows as does Parts_Category.

I have a group index on Parts (status, level, warehouse, updated) called sort_index

At the top of the explain I have | ALL | Using temporary; Using filesort

All the other indexes are showing OK. Can somebody please tell me what the problem might be? Im out of ideas. Maybe i should rearrange this query so i can get better performance perhaps?

query.

SELECT Parts.*, Image.type, Image.width, Image.height,
(SELECT name FROM Location_State WHERE id = Parts.state_id) AS state, 
(SELECT name FROM Location_Region WHERE id = Parts.region_id) AS region, 
(SELECT start_date FROM Promotion WHERE id = Parts.promotion_id) AS promotion_start_date, 
(SELECT end_date FROM Promotion WHERE id = Parts.promotion_id) AS promotion_end_date 
FROM ( SELECT parts_id FROM Parts_Category WHERE Parts_Category.category_id = '40' 
UNION SELECT parts_id FROM Parts_Category WHERE Parts_Category.main_category_id = '40') cid 
LEFT JOIN Image ON Parts.image_id = Image.id
JOIN Parts ON Parts.id = cid.parts_id AND Parts.status = 'A'
ORDER BY Parts.level DESC, Parts.warehouse DESC, Parts.updated DESC LIMIT 0, 15
Table structure for table Parts

Field   Type    Null    Default
id  int(11) No  auto_increment
image_id    int(11) Yes 0
gallery_id  int(11) Yes 0
image_count int(3)  Yes 0
promotion_id    int(11) Yes 0
country_id  int(11) Yes NULL
state_id    int(11) Yes NULL
region_id   int(11) Yes NULL
city_id int(11) Yes NULL
area_id int(11) Yes NULL
updated datetime    Yes 0000-00-00 00:00:00
entered datetime    Yes 0000-00-00 00:00:00
renewal_date    date    Yes 0000-00-00
discount_id varchar(10) Yes NULL
title           varchar(100)    Yes 
search_title    varchar(255)    Yes 
warehouse   varchar(50) Yes 
url varchar(255)    Yes 
display_url varchar(255)    Yes 
friendly_url    varchar(100)    Yes NULL
description varchar(255)    Yes 
keywords    varchar(1000)   Yes NULL
attachment_file varchar(255)    Yes 
attachment_caption  varchar(255)    Yes 
status  char(1) Yes 
level   tinyint(3)  Yes 0
worldwide   tinyint(1)  Yes 0
random_number   int(11) Yes NULL
reminder    tinyint(4)  Yes NULL
category_search varchar(1000)   Yes 
video_snippet   varchar(1000)   Yes 
importID    int(11) Yes 0

Indexes

PRIMARY             518623       id
random_number INDEX 32201   random_number
country_id  INDEX   1       country_id
state_id    INDEX   8       state_id
region_id   INDEX   5       region_id
renewal_date    INDEX   1       renewal_date
worldwide   INDEX   1       worldwide
friendly_url    INDEX   518623      friendly_url
promotion_id    INDEX   1       promotion_id
city_id         INDEX   1   city_id
area_id     INDEX   1       area_id
zip_code    INDEX   2790        zip_code
importID    INDEX   518623      importID
image_id    INDEX   10          image_id

--------------
index_browse_category   INDEX   52 
level
status
warehouse   
updated
-----------------
keywords    FULLTEXT    1 
description
keywords
category_search 


Parts_Category

id              int(11)         No   auto_increment     
parts_id        int(11)             No  0       
category_id         int(11)             No  0       
main_category_id    int(10)             No  0   

Index

PRIMARY          PRIMARY    519330           id
category_id          INDEX  519330          category_id
parts_id
main_category_id     INDEX  519330              main_category_id
parts_id





like image 940
gus Avatar asked Jul 22 '09 14:07

gus


People also ask

How do I optimize a large table in MySQL?

Remove any unnecessary indexes on the table, paying particular attention to UNIQUE indexes as these disable change buffering. Don't use a UNIQUE index unless you need it; instead, employ a regular INDEX. Take a look at your slow query log every week or two. Pick the slowest three queries and optimize those.

How do you optimize a GROUP BY query?

The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any).


2 Answers

Try rewriting your query as this:

SELECT  p.*, i.type, i.width, i.height,
        (SELECT name FROM Location_State WHERE id = p.state_id) AS state, 
        (SELECT name FROM Location_Region WHERE id = p.region_id) AS region, 
        (SELECT start_date FROM Promotion WHERE id = p.promotion_id) AS promotion_start_date, 
        (SELECT end_date FROM Promotion WHERE id = p.promotion_id) AS promotion_end_date 
FROM    parts p
LEFT JOIN
        image i
ON      i.id = p.image_id
WHERE   EXISTS (
        SELECT  NULL
        FROM    Parts_Category pc
        WHERE   pc.category_id = '40'
                AND pc.parts_id = p.id
        UNION ALL
        SELECT  NULL
        FROM    Parts_Category pc
        WHERE   pc.main_category_id = '40'
                AND pc.parts_id = p.id
        )
        AND p.status = 'A'
ORDER BY
        p.status DESC, p.level DESC, p.warehouse DESC, p.updated DESC
LIMIT   15

You need the following indexes for this to work efficiently:

parts (status, level, warehouse, updated) -- this one you have
parts_category (category_id, parts_id)
parts_category (main_category_id, parts_id)

Update:

I just created the tables as this:

DROP TABLE IF EXISTS `test`.`image`;
CREATE TABLE  `test`.`image` (
  `id` int(11) NOT NULL,
  `type` int(11) NOT NULL,
  `width` int(11) NOT NULL,
  `height` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`location_region`;
CREATE TABLE  `test`.`location_region` (
  `id` int(11) NOT NULL,
  `name` varchar(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`location_state`;
CREATE TABLE  `test`.`location_state` (
  `id` int(11) NOT NULL,
  `name` varchar(20) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`parts`;
CREATE TABLE  `test`.`parts` (
  `id` int(11) NOT NULL,
  `status` char(1) NOT NULL,
  `level` int(11) NOT NULL,
  `warehouse` int(11) NOT NULL,
  `updated` int(11) NOT NULL,
  `state_id` int(11) NOT NULL,
  `region_id` int(11) NOT NULL,
  `promotion_id` int(11) NOT NULL,
  `image_id` int(11) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  KEY `status` (`status`,`level`,`warehouse`,`updated`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`parts_category`;
CREATE TABLE  `test`.`parts_category` (
  `id` int(11) NOT NULL,
  `parts_id` int(11) NOT NULL,
  `category_id` int(11) NOT NULL,
  `main_category_id` int(11) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ix_pc_cat_parts` (`category_id`,`parts_id`),
  KEY `ix_pc_main_parts` (`main_category_id`,`parts_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

DROP TABLE IF EXISTS `test`.`promotion`;
CREATE TABLE  `test`.`promotion` (
  `id` int(11) NOT NULL,
  `start_date` datetime NOT NULL,
  `end_date` datetime NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

and filled them with sample data:

INSERT
INTO    parts
SELECT  id,
        CASE WHEN RAND() < 0.1 THEN 'A' ELSE 'B' END,
        RAND() * 100,
        RAND() * 100,
        RAND() * 100,
        RAND() * 50,
        RAND() * 50,
        RAND() * 50,
        RAND() * 50
FROM    t_source
LIMIT 500000;
INSERT
INTO    parts_category
SELECT  id,
        id,
        RAND() * 100,
        RAND() * 100
FROM    t_source
LIMIT 500000;
INSERT
INTO    location_state
SELECT  id, CONCAT('State ', id)
FROM    t_source
LIMIT 1000;
INSERT
INTO    location_region
SELECT  id, CONCAT('Region ', id)
FROM    t_source
LIMIT 1000;
INSERT
INTO    promotion
SELECT  id,
        '2009-07-22' - INTERVAL RAND() * 5 - 20 DAY,
        '2009-07-22' - INTERVAL RAND() * 5 DAY
FROM    t_source
LIMIT 1000;

The query above runs for 30 milliseconds and yields the following plan:

1, 'PRIMARY', 'p', 'ref', 'status', 'status', '3', 'const', 107408, 'Using where'
1, 'PRIMARY', 'i', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.image_id', 1, ''
6, 'DEPENDENT SUBQUERY', 'pc', 'ref', 'ix_pc_cat_parts', 'ix_pc_cat_parts', '8', 'const,test.p.id', 1, 'Using index'
7, 'DEPENDENT UNION', 'pc', 'ref', 'ix_pc_main_parts', 'ix_pc_main_parts', '8', 'const,test.p.id', 1, 'Using index'
, 'UNION RESULT', '<union6,7>', 'ALL', '', '', '', '', , ''
5, 'DEPENDENT SUBQUERY', 'Promotion', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.promotion_id', 1, ''
4, 'DEPENDENT SUBQUERY', 'Promotion', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.promotion_id', 1, ''
3, 'DEPENDENT SUBQUERY', 'Location_Region', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.region_id', 1, ''
2, 'DEPENDENT SUBQUERY', 'Location_State', 'eq_ref', 'PRIMARY', 'PRIMARY', '4', 'test.p.state_id', 1, ''

As you can see, no temporary, no filesort, everything's very fast.

To help you anymore, I just need to see how your tables are defined.

like image 154
Quassnoi Avatar answered Oct 13 '22 00:10

Quassnoi


John, the problem is your query is constructed such that it is selecting from a derived table. The derived table cannot benefit from your indices. Try updating your query as follows:

SELECT
  Parts.*,
  Image.type, Image.width, Image.height,
  Location_State.name AS state,
  Location_Region.name AS region,
  Promotion.start_date AS promotion_start_date,
  Promotion.end_date AS promotion_end_date
FROM Parts
LEFT JOIN Image ON Parts.image_id = Image.id
LEFT JOIN Location_State ON Parts.state_id = Location_State.id
LEFT JOIN Location_Region ON Parts.state_id = Location_Region.id
LEFT JOIN Promotion ON Parts.promotion_id = Promotion.id
INNER JOIN Parts_Category ON (Parts_Category.category_id = 40 OR Parts_Category.main_category_id = 40)
WHERE Parts.status = 'A'
GROUP BY Parts.id
ORDER BY Parts.level DESC, Parts.warehouse DESC, Parts.updated DESC LIMIT 0, 15

Note, if you don't need LEFT JOIN's for your Location_State, Location_Region, Promotion tables, then use an INNER JOIN instead. It will likely perform better.

To further assist in optimizing this query, please provide the following:

SHOW CREATE TABLE Parts;

If the rewritten query I provided works the same as your example (it should), then also provide:

EXPLAIN <my query here>\G
like image 45
hobodave Avatar answered Oct 13 '22 00:10

hobodave