Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql slow count in join query

Tags:

sql

mysql

so i have two tables that i need to be able to get counts for. One of them holds the content and the other on the relationship between it and the categories table. Here are the DDl :

CREATE TABLE content_en (
    id int(11) NOT NULL AUTO_INCREMENT,
    title varchar(100) DEFAULT NULL,
    uid int(11) DEFAULT NULL,
    date_added int(11) DEFAULT NULL,
    date_modified int(11) DEFAULT NULL,
    active tinyint(1) DEFAULT NULL,
    comment_count int(6) DEFAULT NULL,
    orderby tinyint(4) DEFAULT NULL,
    settings text,
    permalink varchar(255) DEFAULT NULL,
    code varchar(3) DEFAULT NULL,
    PRIMARY KEY (id),
    UNIQUE KEY id (id),
    UNIQUE KEY id_2 (id) USING BTREE,
    UNIQUE KEY combo (id,active) USING HASH,
    KEY code (code) USING BTREE
) ENGINE=MyISAM AUTO_INCREMENT=127126 DEFAULT CHARSET=utf8;

and for the other table

CREATE TABLE content_page_categories (
    catid int(11) unsigned NOT NULL,
    itemid int(10) unsigned NOT NULL,
    main tinyint(1) DEFAULT NULL,
    KEY itemid (itemid),
    KEY catid (catid),
    KEY combo (catid,itemid) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

The query i'm running is :

SELECT count(*) 
FROM content_page_categories USE INDEX (combo) 
INNER JOIN content_en USE INDEX (combo) ON (id = itemid) 
WHERE catid = 1 AND active = 1 ;

Both tables have 125k rows and i can't get the count query to run fast enough. Best timing i get is 0.175 which is horrible for this ammount of rows. Selecting 100 rows is as fast as 0.01. I have tried like 3 or 4 variations of this query but in the end the timings are just about the same. Also, if i don't do USE INDEX timing goes 3x slower.

Also tried the following : SELECT COUNT( *) FROM content_page_categories INNER JOIN content_en ON id=itemid AND catid = 1 AND active = 1 WHERE 1

and :

SELECT SQL_CALC_FOUND_ROWS catid,content_en.* FROM content_page_categories INNER JOIN content_en ON (id=itemid) WHERE catid =1 AND active = 1 LIMIT 1; SELECT FOUND_ROWS();

Index definitions : content_en 0 PRIMARY 1 id A 125288 BTREE
content_en 0 id 1 id A 125288 BTREE
content_en 0 id_2 1 id A 125288 BTREE
content_en 0 combo 1 id A BTREE
content_en 0 combo 2 active A YES BTREE
content_en 1 code 1 code A 42 YES BTREE

content_page_categories 1 itemid 1 itemid A 96842 BTREE
content_page_categories 1 catid 1 catid A 10 BTREE
content_page_categories 1 combo 1 catid A 10 BTREE
content_page_categories 1 combo 2 itemid A 96842 BTREE

Any ideas?

[EDIT]

i have uploaded sample data for these tables here

result of explain :

mysql> explain SELECT count(*) FROM  content_page_categories USE INDEX (combo) I<br>
NNER JOIN content_en USE INDEX (combo) ON (id = itemid) WHERE  catid = 1 AND act<br>
ive = 1 ;

+----+-------------+-------------------------+-------+---------------+-------+---------+--------------------------+--------+--------------------------+
| id | select_type | table                   | type  | possible_keys | key   | key_len | ref                      | rows   | Extra                    |
+----+-------------+-------------------------+-------+---------------+-------+---------+--------------------------+--------+--------------------------+
|  1 | SIMPLE      | content_en              | index | combo         | combo | 6 | NULL                     | 125288 | Using where; Using index |
|  1 | SIMPLE      | content_page_categories | ref   | combo         | combo | 8 | const,mcms.content_en.id |      1 | Using where; Using index |
+----+-------------+-------------------------+-------+---------------+-------+---------+--------------------------+--------+--------------------------+
2 rows in set (0.00 sec)
like image 1000
mbouclas Avatar asked Jul 04 '13 12:07

mbouclas


1 Answers

I downloaded your data and tried a few experiments. I'm running MySQL 5.6.12 on a CentOS virtual machine on a Macbook Pro. The times I observed can be used for comparison, but your system may have different performance.

Base case

First I tried without the USE INDEX clauses, because I avoid optimizer overrides where possible. In most cases, a simple query like this should use the correct index if it's available. Hard-coding the index choice in a query makes it harder to use a better index later.

I also use correlation names (table aliases) to make the query more clear.

mysql> EXPLAIN SELECT COUNT(*) FROM content_en AS e  
INNER JOIN content_page_categories AS c ON c.itemid = e.id 
WHERE c.catid = 1 AND e.active = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: ref
possible_keys: combo,combo2
          key: combo
      key_len: 4
          ref: const
         rows: 71198
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
         type: eq_ref
possible_keys: PRIMARY,combo2,combo
          key: PRIMARY
      key_len: 4
          ref: test.c.itemid
         rows: 1
        Extra: Using where
  • This executed in 0.36 seconds.

Covering index

I'd like to get "Using index" on the second table as well, so I need an index on (active, id) in that order. I had to USE INDEX in this case to persuade the optimizer not to use the primary key.

mysql> ALTER TABLE content_en ADD KEY combo2 (active, id);

mysql> explain SELECT COUNT(*) FROM content_en AS e USE INDEX (combo2) 
INNER JOIN content_page_categories AS c ON c.itemid = e.id 
WHERE c.catid = 1 AND e.active = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: ref
possible_keys: combo,combo2
          key: combo
      key_len: 4
          ref: const
         rows: 71198
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: e
         type: ref
possible_keys: combo2
          key: combo2
      key_len: 6
          ref: const,test.c.itemid
         rows: 1
        Extra: Using where; Using index

The rows reported by EXPLAIN is an important indicator of how much work it's going to take to execute the query. Notice the rows in the above EXPLAIN is only 71k, much smaller than the 125k rows you got when you scanned the content_en table first.

  • This executed in 0.44 seconds. This is unexpected, because usually a query using a covering index is an improvement.

Convert tables to InnoDB

I tried the same covering index solution as above, but with InnoDB as the storage engine.

mysql> ALTER TABLE content_en ENGINE=InnoDB;
mysql> ALTER TABLE content_page_categories ENGINE=InnoDB;

This had the same EXPLAIN report. It took 1 or 2 iterations to warm the buffer pool, but then the performance of the query tripled.

  • This executed in 0.16 seconds.

  • I also tried removing the USE INDEX, and the time increased slightly, to 0.17 seconds.

@Matthew's solution with STRAIGHT_JOIN

mysql> SELECT straight_join count(*) 
 FROM content_en 
 INNER JOIN content_page_categories use index (combo) 
  ON (id = itemid) 
 WHERE catid = 1 AND active = 1;
  • This executed in 0.20 - 0.22 seconds.

@bobwienholt's solution, denormalization

I tried the solution proposed by @bobwienholt, using denormalization to copy the active attribute to the content_page_categories table.

mysql> ALTER TABLE content_page_categories ADD COLUMN active TINYINT(1);
mysql> UPDATE content_en JOIN content_page_categories ON id = itemid 
    SET content_page_categories.active = content_en.active;
mysql> ALTER TABLE content_page_categories ADD KEY combo3 (catid,active);
mysql> SELECT COUNT(*) FROM content_page_categories WHERE catid = 1 and active = 1;

This executed in 0.037 - 0.044 seconds. So this is better, if you can maintain the redundant active column in sync with the value in the content_en table.

@Quassnoi's solution, summary table

I tried the solution proposed by @Quassnoi, to maintain a table with precomputed counts per catid and active. The table should have very few rows, and looking up the counts you need are primary key lookups and require no JOINs.

mysql> CREATE TABLE page_active_category (
 active INT NOT NULL, 
 catid INT NOT NULL, 
 cnt BIGINT NOT NULL,
 PRIMARY KEY (active, catid) 
) ENGINE=InnoDB;

mysql> INSERT INTO page_active_category
 SELECT  e.active, c.catid, COUNT(*)
 FROM    content_en AS e
 JOIN    content_page_categories AS c ON c.itemid = e.id
 GROUP BY e.active, c.catid

mysql> SELECT cnt FROM page_active_category WHERE active = 1 AND catid = 1

This executed in 0.0007 - 0.0017 seconds. So this is the best solution by an order of magnitude, if you can maintain the table with aggregate counts.

You can see from this that different types of denormalization (including a summary table) is an extremely powerful tool for the sake of performance, though it has drawbacks because maintaining the redundant data can be inconvenient and makes your application more complex.

like image 51
Bill Karwin Avatar answered Nov 05 '22 14:11

Bill Karwin