I have an sql query that counts the number of results for a complex query. The actual select query is very fast when limiting to 20 results, but the count version takes about 4.5 seconds on my current tables after lots of optimizing.
If I remove the two joins and where clauses on site tags and gallery tags, the query performs at 1.5 seconds. If I create 3 separate queries - one to select the pay sites, one to select the names and one to pull everything together - I can get the query down to .6 seconds, which is still not good enough. This would also force me to use a stored procedure since I will have to make a total of 4 queries in Hibernate.
For the query "as is", here is some info:
The Handler_read_key is 1746669
The Handler_read_next is 1546324
The gallery table has 40,000 rows
The site table has 900 rows
The name table has 800 rows
The tag table has 3560 rows
I'm pretty new to MySQL and tuning, and I have indexes on the:
I am looking to get this query to 0.1 milliseconds.
SELECT count(distinct gallery.id)
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and (
name.value LIKE 'sometext%' or
tag.term = 'sometext' or
site.`name` like 'sometext%' or
site_tag.term = 'sometext'
)
Explain Data:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
| 1 | SIMPLE | site | index | PRIMARY,nameIndex | nameIndex | 258 | NULL | 950 | Using index; Using temporary |
| 1 | SIMPLE | gallery | ref | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex | FKF44C775296EECE37 | 9 | production.site.id | 20 | Using where |
| 1 | SIMPLE | g2mn | ref | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188 | FK3EFFD7F8AFAD7A5E | 8 | production.gallery.id | 1 | Using index; Distinct |
| 1 | SIMPLE | name | eq_ref | PRIMARY,valueIndex | PRIMARY | 8 | production.g2mn.name_id | 1 | Distinct |
| 1 | SIMPLE | g2t | ref | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6 | FK3DDB4D63AFAD7A5E | 8 | production.g2mn.gallery_id | 2 | Using where; Using index; Distinct |
| 1 | SIMPLE | tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.g2t.tag_id | 1 | Distinct |
| 1 | SIMPLE | p2t | ref | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6 | PRIMARY | 8 | production.gallery.site_id | 3 | Using where; Using index; Distinct |
| 1 | SIMPLE | site_tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.p2t.tag_id | 1 | Using where; Distinct |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
Individual Count Speeds:
[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385
[SQL]
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615
[SQL]
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104
[SQL]
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560
[SQL]
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901
[SQL]
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026
I've included my test schema and a script to produce test data at the end of this post. I have used the SQL_NO_CACHE
option to prevent MySQL from caching query results - this is just for testing and should ultimately be removed.
This is a similar idea to that proposed by Donnie, but I have tidied it up a little. If I have understood the joins correctly, there is no need to repeat all the joins in each select, as each is effectively independent from the others. The original WHERE
clause stipulates that gallery.published
must be true and then follows with a series of 4 conditions joined by OR
. Each query can therefore be executed separately. Here are the four joins:
gallery <--> gallery_to_name <--> name
gallery <--> gallery_to_tag <--> tag
gallery <--> site
gallery <--> site <--> site_to_tag <--> tag
Because gallery
contains site_id
, in this case, there's no need for the intermediate join via the site
table. The last join can therefore be reduced to this:
gallery <--> site_to_tag <--> tag
Running each SELECT
separately, and using UNION
to combine the results, is very fast. The results here assume the table structures and indexes shown at the end of this post:
SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
(SELECT g.id
FROM gallery AS g
INNER JOIN site AS s ON s.id = g.site_id
WHERE g.published = TRUE AND s.name LIKE '3GRD%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
WHERE g.published = TRUE AND n.value LIKE '3GRD%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
WHERE g.published = TRUE AND gt.term = '3GRD')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND st.term = '3GRD')
) AS totals;
+---------+
| matches |
+---------+
| 99 |
+---------+
1 row in set (0.00 sec)
The speed does vary depending on the search criteria. In the following example, a different search value is used for each table, and the LIKE operator has to do a little more work, as there are now more potential matches for each:
SELECT SQL_NO_CACHE COUNT(id) AS matches FROM (
(SELECT g.id
FROM gallery AS g
INNER JOIN site AS s ON s.id = g.site_id
WHERE g.published = TRUE AND s.name LIKE '3H%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
WHERE g.published = TRUE AND n.value LIKE '3G%')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
WHERE g.published = TRUE AND gt.term = '3IDP')
UNION
(SELECT g.id
FROM gallery AS g
INNER JOIN site_to_tag AS s2t ON s2t.site_id = g.site_id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND st.term = '3OJX')
) AS totals;
+---------+
| matches |
+---------+
| 12505 |
+---------+
1 row in set (0.24 sec)
These results compare favourably with the a query which uses multiple joins:
SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
INNER JOIN site AS s ON s.id = g.site_id
INNER JOIN site_to_tag AS s2t ON s2t.site_id = s.id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
gt.term = '3GRD' OR
st.term = '3GRD' OR
n.value LIKE '3GRD%' OR
s.name LIKE '3GRD%');
+---------+
| matches |
+---------+
| 99 |
+---------+
1 row in set (2.62 sec)
SELECT SQL_NO_CACHE COUNT(DISTINCT g.id) AS matches
FROM gallery AS g
INNER JOIN gallery_to_name AS g2n ON g2n.gallery_id = g.id
INNER JOIN name AS n ON n.id = g2n.name_id
INNER JOIN gallery_to_tag AS g2t ON g2t.gallery_id = g.id
INNER JOIN tag AS gt ON gt.id = g2t.tag_id
INNER JOIN site AS s ON s.id = g.site_id
INNER JOIN site_to_tag AS s2t ON s2t.site_id = s.id
INNER JOIN tag AS st ON st.id = s2t.tag_id
WHERE g.published = TRUE AND (
gt.term = '3IDP' OR
st.term = '3OJX' OR
n.value LIKE '3G%' OR
s.name LIKE '3H%');
+---------+
| matches |
+---------+
| 12505 |
+---------+
1 row in set (3.17 sec)
SCHEMA
The indexes on id columns plus site.name
, name.value
and tag.term
are important:
DROP SCHEMA IF EXISTS `egervari`;
CREATE SCHEMA IF NOT EXISTS `egervari`;
USE `egervari`;
-- -----------------------------------------------------
-- Table `site`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `site` ;
CREATE TABLE IF NOT EXISTS `site` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`name` VARCHAR(255) NOT NULL ,
INDEX `name` (`name` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `gallery`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `gallery` ;
CREATE TABLE IF NOT EXISTS `gallery` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`site_id` INT UNSIGNED NOT NULL ,
`published` TINYINT(1) NOT NULL DEFAULT 0 ,
PRIMARY KEY (`id`) ,
INDEX `fk_gallery_site` (`site_id` ASC) ,
CONSTRAINT `fk_gallery_site`
FOREIGN KEY (`site_id` )
REFERENCES `site` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `name`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `name` ;
CREATE TABLE IF NOT EXISTS `name` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`value` VARCHAR(255) NOT NULL ,
INDEX `value` (`value` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `tag`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `tag` ;
CREATE TABLE IF NOT EXISTS `tag` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`term` VARCHAR(255) NOT NULL ,
INDEX `term` (`term` ASC) ,
PRIMARY KEY (`id`) )
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `gallery_to_name`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `gallery_to_name` ;
CREATE TABLE IF NOT EXISTS `gallery_to_name` (
`gallery_id` INT UNSIGNED NOT NULL ,
`name_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`gallery_id`, `name_id`) ,
INDEX `fk_gallery_to_name_gallery` (`gallery_id` ASC) ,
INDEX `fk_gallery_to_name_name` (`name_id` ASC) ,
CONSTRAINT `fk_gallery_to_name_gallery`
FOREIGN KEY (`gallery_id` )
REFERENCES `gallery` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_gallery_to_name_name`
FOREIGN KEY (`name_id` )
REFERENCES `name` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `gallery_to_tag`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `gallery_to_tag` ;
CREATE TABLE IF NOT EXISTS `gallery_to_tag` (
`gallery_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`gallery_id`, `tag_id`) ,
INDEX `fk_gallery_to_tag_gallery` (`gallery_id` ASC) ,
INDEX `fk_gallery_to_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_gallery_to_tag_gallery`
FOREIGN KEY (`gallery_id` )
REFERENCES `gallery` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_gallery_to_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
-- -----------------------------------------------------
-- Table `site_to_tag`
-- -----------------------------------------------------
DROP TABLE IF EXISTS `site_to_tag` ;
CREATE TABLE IF NOT EXISTS `site_to_tag` (
`site_id` INT UNSIGNED NOT NULL ,
`tag_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`site_id`, `tag_id`) ,
INDEX `fk_site_to_tag_site` (`site_id` ASC) ,
INDEX `fk_site_to_tag_tag` (`tag_id` ASC) ,
CONSTRAINT `fk_site_to_tag_site`
FOREIGN KEY (`site_id` )
REFERENCES `site` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE,
CONSTRAINT `fk_site_to_tag_tag`
FOREIGN KEY (`tag_id` )
REFERENCES `tag` (`id` )
ON DELETE CASCADE
ON UPDATE CASCADE)
ENGINE = InnoDB;
TEST DATA
This populates site
with 900 rows, tag
with 3560 rows, name
with 800 rows and gallery
with 40,000 rows, and inserts entries into the link tables:
DELIMITER //
DROP PROCEDURE IF EXISTS populate//
CREATE PROCEDURE populate()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 900 DO
INSERT INTO site (name) VALUES (CONV(i + 1 * 10000, 20, 36));
SET i = i + 1;
END WHILE;
SET i = 0;
WHILE i < 3560 DO
INSERT INTO tag (term) VALUES (CONV(i + 1 * 10000, 20, 36));
INSERT INTO site_to_tag (site_id, tag_id) VALUES ( (i MOD 900) + 1, i + 1 );
SET i = i + 1;
END WHILE;
SET i = 0;
WHILE i < 800 DO
INSERT INTO name (value) VALUES (CONV(i + 1 * 10000, 20, 36));
SET i = i + 1;
END WHILE;
SET i = 0;
WHILE i < 40000 DO
INSERT INTO gallery (site_id, published) VALUES ( (i MOD 900) + 1, i MOD 2 );
INSERT INTO gallery_to_name (gallery_id, name_id) VALUES ( i + 1, (i MOD 800) + 1 );
INSERT INTO gallery_to_tag (gallery_id, tag_id) VALUES ( i + 1, (i MOD 3560) + 1 );
SET i = i + 1;
END WHILE;
END;
//
DELIMITER ;
CALL populate();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With