Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I improve performance on a DISTINCT select across three joined tables?

I have the following tables in question:

  • Personas
  • ImpressionsPersonas [join table - Personas ManyToMany Impressions]
  • Impressions

My query looks like this, the EXPLAIN results are attached below:

   SELECT

   DISTINCT (Personas.id),
    Personas.parent_id, 
   Personas.persona, 
   Personas.subpersonas_count, 
   Personas.is_subpersona, 
   Personas.impressions_count, 
   Personas.created, 
   Personas.modified


   FROM personas as Personas 

   INNER JOIN   
        impressions_personas ImpressionsPersonas ON (
        Personas.id = ImpressionsPersonas.persona_id
    )
   inner JOIN impressions Impressions ON (Impressions.id = ImpressionsPersonas.impression_id AND Impressions.timestamp >= "2016-06-01 00:00:00" AND Impressions.timestamp <= "2016-07-31 00:00:00") 

EXPLAIN

    +----+-------------+---------------------+--------+-----------------------------------------------------------------------+-------------+---------+---------------------------------------------+------+----------+-----------------------+
    | id | select_type | table               | type   | possible_keys                                                         | key         | key_len | ref                                         | rows | filtered | Extra                 |
    +----+-------------+---------------------+--------+-----------------------------------------------------------------------+-------------+---------+---------------------------------------------+------+----------+-----------------------+
    |  1 | SIMPLE      | Personas            | ALL    | PRIMARY                                                               | NULL        | NULL    | NULL                                        |  159 |   100.00 | Using temporary       |
    |  1 | SIMPLE      | ImpressionsPersonas | ref    | impression_idx,persona_idx,comp_imp_persona,comp_imp_pri,comp_per_pri | persona_idx | 8       | gen1_d2go.Personas.id                       |  396 |   100.00 | Distinct              |
    |  1 | SIMPLE      | Impressions         | eq_ref | PRIMARY,timestamp,timestamp_id                                        | PRIMARY     | 8       | gen1_d2go.ImpressionsPersonas.impression_id |    1 |   100.00 | Using where; Distinct |
    +----+-------------+---------------------+--------+-----------------------------------------------------------------------+-------------+---------+---------------------------------------------+------+----------+-----------------------+
    3 rows in set, 1 warning (0.00 sec)

CREATE STATEMENT FOR PERSONAS

CREATE TABLE `personas` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `parent_id` bigint(20) unsigned DEFAULT NULL,
  `persona` varchar(150) NOT NULL,
  `subpersonas_count` int(10) unsigned DEFAULT '0',
  `is_subpersona` tinyint(1) unsigned DEFAULT '0',
  `impressions_count` bigint(20) unsigned DEFAULT '0',
  `created` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `lookup` (`parent_id`,`persona`),
  KEY `parent_index` (`parent_id`),
  KEY `persona` (`persona`),
  KEY `persona_a_id` (`id`,`persona`),
  CONSTRAINT `self_referential_join_to_self` FOREIGN KEY (`parent_id`) REFERENCES `personas` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1049 DEFAULT CHARSET=utf8;

CREATE STATEMENT FOR IMPRESSIONS_PERSONAS

CREATE TABLE `impressions_personas` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `impression_id` bigint(20) unsigned NOT NULL,
  `persona_id` bigint(20) unsigned NOT NULL,
  `created` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `impression_idx` (`impression_id`),
  KEY `persona_idx` (`persona_id`),
  KEY `comp_imp_persona` (`impression_id`,`persona_id`),
  KEY `comp_imp_pri` (`impression_id`,`id`),
  KEY `comp_per_pri` (`persona_id`,`id`),
  CONSTRAINT `impression` FOREIGN KEY (`impression_id`) REFERENCES `impressions` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
  CONSTRAINT `persona` FOREIGN KEY (`persona_id`) REFERENCES `personas` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=19387839 DEFAULT CHARSET=utf8;

CREATE STATEMENT FOR IMPRESSIONS

CREATE TABLE `impressions` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `device_id` bigint(20) unsigned NOT NULL,
  `beacon_id` bigint(20) unsigned NOT NULL,
  `zone_id` bigint(20) unsigned NOT NULL,
  `application_id` bigint(20) unsigned DEFAULT NULL,
  `timestamp` datetime NOT NULL,
  `google_place_id` bigint(20) unsigned DEFAULT NULL,
  `name` varchar(60) DEFAULT NULL,
  `lat` decimal(15,10) DEFAULT NULL,
  `lng` decimal(15,10) DEFAULT NULL,
  `personas_count` int(10) unsigned DEFAULT '0',
  `created` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `device_idx` (`device_id`),
  KEY `zone_idx` (`zone_id`),
  KEY `beacon_id_idx2` (`beacon_id`),
  KEY `timestamp` (`timestamp`),
  KEY `appid_fk_idx_idx` (`application_id`),
  KEY `comp_lookup` (`device_id`,`beacon_id`,`timestamp`),
  KEY `timestamp_id` (`timestamp`,`id`),
  CONSTRAINT `appid_fk_idx` FOREIGN KEY (`application_id`) REFERENCES `applications` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `beacon_id` FOREIGN KEY (`beacon_id`) REFERENCES `beacons` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
  CONSTRAINT `device2` FOREIGN KEY (`device_id`) REFERENCES `devices` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `zone_FK` FOREIGN KEY (`zone_id`) REFERENCES `zones` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1582724 DEFAULT CHARSET=utf8;

Now - when I run the query without the DISTINCT and using a COUNT(*), it pulls about 17,000,000 records. Running it with DISTINCT yields 112 records. I am not sure why there are so many records showing up when the explain showed only 159 and 396.

Some information about the tables:

The Personas table contains 159 records. The ImpressionsPersonas table contains about 12.6 million, and Impressions contains about 920,000 records.

What we are doing is selecting the Personas table and joining to the Impressions by way of the join table ImpressionsPersonas. There are filters applied to the Impressions table (date in this case).

Note: removing the date filter had a negligible impact on the execution time - which hovers right around 120s. Is there a way to filter these records down to cut down the execution time of this query?

like image 762
Barry Chapman Avatar asked Oct 30 '22 22:10

Barry Chapman


1 Answers

I presume that you want to get the list of persons who have at least 1 impression within a specified time period. To get this, you can use such a correlated sub-query:

SELECT
   Personas.id,
   Personas.parent_id, 
   Personas.persona, 
   Personas.subpersonas_count, 
   Personas.is_subpersona, 
   Personas.impressions_count, 
   Personas.created, 
   Personas.modified

   FROM personas as Personas 
   WHERE EXISTS(SELECT 1 FROM impressions_personas 
       LEFT JOIN impressions Impressions ON 
           Impressions.id = ImpressionsPersonas.impression_id 
       WHERE Personas.id = ImpressionsPersonas.persona_id
           AND Impressions.timestamp >= "2016-06-01 00:00:00" 
           AND Impressions.timestamp <= "2016-07-31 00:00:00"
   )
like image 115
IVO GELOV Avatar answered Nov 14 '22 02:11

IVO GELOV