Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tune a 7-table-join MySQL count query where tables contain 30,000+ rows?

I have an sql query that counts the number of results for a complex query. The actual select query is very fast when limiting to 20 results, but the count version takes about 4.5 seconds on my current tables after lots of optimizing.

If I remove the two joins and where clauses on site tags and gallery tags, the query performs at 1.5 seconds. If I create 3 separate queries - one to select the pay sites, one to select the names and one to pull everything together - I can get the query down to .6 seconds, which is still not good enough. This would also force me to use a stored procedure since I will have to make a total of 4 queries in Hibernate.

For the query "as is", here is some info:

The Handler_read_key is 1746669
The Handler_read_next is 1546324

The gallery table has 40,000 rows
The site table has 900 rows
The name table has 800 rows
The tag table has 3560 rows

I'm pretty new to MySQL and tuning, and I have indexes on the:

  • 'term' column in the tag table
  • 'published' column in the gallery table
  • 'value' for the name table

I am looking to get this query to 0.1 milliseconds.

SELECT count(distinct gallery.id)
from gallery gallery 
    inner join
        site site 
            on gallery.site_id = site.id 
    inner join
        site_to_tag p2t 
            on site.id = p2t.site_id 
    inner join
        tag site_tag 
            on p2t.tag_id = site_tag.id 
    inner join
        gallery_to_name g2mn 
            on gallery.id = g2mn.gallery_id 
    inner join
        name name 
            on g2mn.name_id = name.id 
    inner join
        gallery_to_tag g2t 
            on gallery.id = g2t.gallery_id 
    inner join
        tag tag 
            on g2t.tag_id = tag.id
where
    gallery.published = true and (
        name.value LIKE 'sometext%' or
        tag.term = 'sometext' or 
        site.`name` like 'sometext%' or
        site_tag.term = 'sometext'
    )

Explain Data:

| id | select_type | table        | type   | possible_keys                                                     | key                | key_len | ref                                       | rows | Extra                              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
|  1 | SIMPLE      | site         | index  | PRIMARY,nameIndex                                                 | nameIndex          | 258     | NULL                                      |  950 | Using index; Using temporary       |
|  1 | SIMPLE      | gallery      | ref    | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex    | FKF44C775296EECE37 | 9       | production.site.id                        |   20 | Using where                        |
|  1 | SIMPLE      | g2mn         | ref    | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188                     | FK3EFFD7F8AFAD7A5E | 8       | production.gallery.id                     |    1 | Using index; Distinct              |
|  1 | SIMPLE      | name         | eq_ref | PRIMARY,valueIndex                                                | PRIMARY            | 8       | production.g2mn.name_id                   |    1 | Distinct                           |
|  1 | SIMPLE      | g2t          | ref    | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6                     | FK3DDB4D63AFAD7A5E | 8       | production.g2mn.gallery_id                |    2 | Using where; Using index; Distinct |
|  1 | SIMPLE      | tag          | eq_ref | PRIMARY,termIndex                                                 | PRIMARY            | 8       | production.g2t.tag_id                     |    1 | Distinct                           |
|  1 | SIMPLE      | p2t          | ref    | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6                     | PRIMARY            | 8       | production.gallery.site_id                |    3 | Using where; Using index; Distinct |
|  1 | SIMPLE      | site_tag     | eq_ref | PRIMARY,termIndex                                                 | PRIMARY            | 8       | production.p2t.tag_id                     |    1 | Using where; Distinct              |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+

Individual Count Speeds:

[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385

[SQL] 
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615

[SQL] 
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104

[SQL] 
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560    

[SQL] 
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901

[SQL] 
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026
like image 708
egervari Avatar asked Feb 04 '23 04:02

egervari


1 Answers

I've included my test schema and a script to produce test data at the end of this post. I have used the SQL_NO_CACHE option to prevent MySQL from caching query results - this is just for testing and should ultimately be removed.

This is a similar idea to that proposed by Donnie, but I have tidied it up a little. If I have understood the joins correctly, there is no need to repeat all the joins in each select, as each is effectively independent from the others. The original WHERE clause stipulates that gallery.published must be true and then follows with a series of 4 conditions joined by OR. Each query can therefore be executed separately. Here are the four joins:

gallery <--> gallery_to_name <--> name
gallery <--> gallery_to_tag <--> tag
gallery <--> site
gallery <--> site <--> site_to_tag <--> tag

Because gallery contains site_id, in this case, there's no need for the intermediate join via the site table. The last join can therefore be reduced to this:

gallery <--> site_to_tag <--> tag

Running each SELECT separately, and using UNION to combine the results, is very fast. The results here assume the table structures and indexes shown at the end of this post:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site AS s ON s.id = g.site_id
    WHERE g.published = TRUE AND s.name LIKE '3GRD%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
    INNER JOIN name AS n ON n.id = g2n.name_id
    WHERE g.published = TRUE AND n.value LIKE '3GRD%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
    INNER JOIN tag AS gt  ON gt.id = g2t.tag_id
    WHERE g.published = TRUE AND gt.term = '3GRD')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
    INNER JOIN tag AS st  ON st.id = s2t.tag_id
    WHERE g.published = TRUE AND st.term = '3GRD')
) AS totals;

+---------+
| matches |
+---------+
|      99 |
+---------+
1 row in set (0.00 sec)

The speed does vary depending on the search criteria. In the following example, a different search value is used for each table, and the LIKE operator has to do a little more work, as there are now more potential matches for each:

SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site AS s ON s.id = g.site_id
    WHERE g.published = TRUE AND s.name LIKE '3H%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
    INNER JOIN name AS n ON n.id = g2n.name_id
    WHERE g.published = TRUE AND n.value LIKE '3G%')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
    INNER JOIN tag AS gt  ON gt.id = g2t.tag_id
    WHERE g.published = TRUE AND gt.term = '3IDP')
UNION
   (SELECT g.id
    FROM gallery AS g
    INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
    INNER JOIN tag AS st  ON st.id = s2t.tag_id
    WHERE g.published = TRUE AND st.term = '3OJX')
) AS totals;

+---------+
| matches |
+---------+
|   12505 |
+---------+
1 row in set (0.24 sec)

These results compare favourably with the a query which uses multiple joins:

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name            AS n   ON n.id = g2n.name_id
INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag             AS gt  ON gt.id = g2t.tag_id
INNER JOIN site            AS s   ON s.id = g.site_id
INNER JOIN site_to_tag     AS s2t ON s2t.site_id = s.id
INNER JOIN tag             AS st  ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
    gt.term = '3GRD' OR
    st.term = '3GRD' OR
    n.value LIKE '3GRD%' OR
    s.name LIKE '3GRD%');

+---------+
| matches |
+---------+
|      99 |
+---------+
1 row in set (2.62 sec)

SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name            AS n   ON n.id = g2n.name_id
INNER JOIN gallery_to_tag  AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag             AS gt  ON gt.id = g2t.tag_id
INNER JOIN site            AS s   ON s.id = g.site_id
INNER JOIN site_to_tag     AS s2t ON s2t.site_id = s.id
INNER JOIN tag             AS st  ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
    gt.term = '3IDP' OR
    st.term = '3OJX' OR
    n.value LIKE '3G%' OR
    s.name LIKE '3H%');

+---------+
| matches |
+---------+
|   12505 |
+---------+
1 row in set (3.17 sec)

SCHEMA
The indexes on id columns plus site.name, name.value and tag.term are important:

DROP SCHEMA IF EXISTS `egervari`;
CREATE SCHEMA IF NOT EXISTS `egervari`;
USE `egervari`;

-- -----------------------------------------------------
-- Table `site`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site` ;
CREATE  TABLE IF NOT EXISTS `site` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `name` VARCHAR(255) NOT NULL ,
  INDEX `name` (`name` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery` ;
CREATE  TABLE IF NOT EXISTS `gallery` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `site_id` INT UNSIGNED NOT NULL ,
  `published` TINYINT(1) NOT NULL DEFAULT 0 ,
  PRIMARY KEY (`id`) ,
  INDEX `fk_gallery_site` (`site_id` ASC) ,
  CONSTRAINT `fk_gallery_site`
    FOREIGN KEY (`site_id` )
    REFERENCES `site` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `name` ;
CREATE  TABLE IF NOT EXISTS `name` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `value` VARCHAR(255) NOT NULL ,
  INDEX `value` (`value` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `tag` ;
CREATE  TABLE IF NOT EXISTS `tag` (
  `id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
  `term` VARCHAR(255) NOT NULL ,
  INDEX `term` (`term` ASC) ,
  PRIMARY KEY (`id`) )
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_name`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_name` ;
CREATE  TABLE IF NOT EXISTS `gallery_to_name` (
  `gallery_id` INT UNSIGNED NOT NULL ,
  `name_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`gallery_id`, `name_id`) ,
  INDEX `fk_gallery_to_name_gallery` (`gallery_id` ASC) ,
  INDEX `fk_gallery_to_name_name` (`name_id` ASC) ,
  CONSTRAINT `fk_gallery_to_name_gallery`
    FOREIGN KEY (`gallery_id` )
    REFERENCES `gallery` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_gallery_to_name_name`
    FOREIGN KEY (`name_id` )
    REFERENCES `name` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `gallery_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `gallery_to_tag` ;
CREATE  TABLE IF NOT EXISTS `gallery_to_tag` (
  `gallery_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`gallery_id`, `tag_id`) ,
  INDEX `fk_gallery_to_tag_gallery` (`gallery_id` ASC) ,
  INDEX `fk_gallery_to_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_gallery_to_tag_gallery`
    FOREIGN KEY (`gallery_id` )
    REFERENCES `gallery` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_gallery_to_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

-- -----------------------------------------------------
-- Table `site_to_tag`
-- -----------------------------------------------------

DROP TABLE IF EXISTS `site_to_tag` ;
CREATE  TABLE IF NOT EXISTS `site_to_tag` (
  `site_id` INT UNSIGNED NOT NULL ,
  `tag_id` INT UNSIGNED NOT NULL ,
  PRIMARY KEY (`site_id`, `tag_id`) ,
  INDEX `fk_site_to_tag_site` (`site_id` ASC) ,
  INDEX `fk_site_to_tag_tag` (`tag_id` ASC) ,
  CONSTRAINT `fk_site_to_tag_site`
    FOREIGN KEY (`site_id` )
    REFERENCES `site` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE,
  CONSTRAINT `fk_site_to_tag_tag`
    FOREIGN KEY (`tag_id` )
    REFERENCES `tag` (`id` )
    ON DELETE CASCADE
    ON UPDATE CASCADE)
ENGINE = InnoDB;

TEST DATA
This populates site with 900 rows, tag with 3560 rows, name with 800 rows and gallery with 40,000 rows, and inserts entries into the link tables:

DELIMITER //
DROP PROCEDURE IF EXISTS populate//
CREATE PROCEDURE populate()
BEGIN
    DECLARE i INT DEFAULT 0;

    WHILE i < 900 DO
        INSERT INTO site (name) VALUES (CONV(i + 1 * 10000, 20, 36));
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 3560 DO
        INSERT INTO tag (term) VALUES (CONV(i + 1 * 10000, 20, 36));
        INSERT INTO site_to_tag (site_id, tag_id) VALUES ( (i MOD 900) + 1, i + 1 );
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 800 DO
        INSERT INTO name (value) VALUES (CONV(i + 1 * 10000, 20, 36));
        SET i = i + 1;
    END WHILE;

    SET i = 0;
    WHILE i < 40000 DO    
        INSERT INTO gallery (site_id, published) VALUES ( (i MOD 900) + 1, i MOD 2 );
        INSERT INTO gallery_to_name (gallery_id, name_id) VALUES ( i + 1, (i MOD 800) + 1 );
        INSERT INTO gallery_to_tag (gallery_id, tag_id) VALUES ( i + 1, (i MOD 3560) + 1 );
        SET i = i + 1;
    END WHILE;
END;
//
DELIMITER ;
CALL populate();
like image 96
Mike Avatar answered Feb 05 '23 18:02

Mike