Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

optimize Mysql query having timezone conversion and group by hour

Tags:

sql

mysql

This is my table in MySql 5.5 having 30 million records

CREATE TABLE `campaign_logs` (
  `domain` varchar(50) DEFAULT NULL,
  `campaign_id` varchar(50) DEFAULT NULL,
  `subscriber_id` varchar(50) DEFAULT NULL,
  `message` varchar(21000) DEFAULT NULL,
  `log_time` datetime DEFAULT NULL,
  `log_type` varchar(50) DEFAULT NULL,
  `level` varchar(50) DEFAULT NULL,
  `campaign_name` varchar(500) DEFAULT NULL,
  KEY `subscriber_id_index` (`subscriber_id`),
  KEY `log_type_index` (`log_type`),
  KEY `log_time_index` (`log_time`),
  KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
  KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

In the following query, I'm doing group by hour with respect to timezone

QUERY

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_OPENED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date

UNION ALL

SELECT
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d 
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index) 
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_SENT' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date 

UNION ALL 

SELECT 
    log_type
    ,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
    ,count(*) AS total
    ,count(DISTINCT subscriber_id) d
FROM
    stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
    DOMAIN='xxx' 
    AND campaign_id='123' 
    AND log_type = 'EMAIL_CLICKED' 
    AND log_time BETWEEN 
        CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
        CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date;

RESULTS

The above query will give results like this

+---------------+-------+----------------+-------------+
| EMAIL_CLICKED | 1 AM  |             71 |          83 |
| EMAIL_CLICKED | 1 PM  |             25 |          27 |
| EMAIL_SENT    | 10 AM |             51 |          59 |
| EMAIL_OPENED  | 10 PM |             16 |          18 |

This is the explain of above query

EXPLAIN

+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| id | select_type  | table         | type  | possible_keys                             | key                                       | key_len | ref  | rows   | Extra                                    |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
|  1 | PRIMARY      | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |  55074 | Using where; Using index; Using filesort |
|  2 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL | 330578 | Using where; Using index; Using filesort |
|  3 | UNION        | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468     | NULL |   1589 | Using where; Using index; Using filesort |
|NULL| UNION RESULT | <union1,2,3>  | ALL   | NULL                                      | NULL                                      | NULL    | NULL |   NULL |                                          |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+

OPTIMIZATION ?

We have a covering index on this table.

This query is taking long time (more than 1 minute).

If I remove the distinct_count(subscriber_id) from the query, then we are getting results in 1.5 sec, but I need distinct_count of subscriber_id from the query.

Is there any way to optimize this query ?

Thanks

like image 946
Rams Avatar asked Nov 01 '22 09:11

Rams


1 Answers

You are not processing a tremendous amount of data, so the group by should not be taking 40 seconds -- assuming that you are not on a really busy server with lots of lock activity on the table.

Try this version of the query (limited to one log_type):

SELECT log_type,
       DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time,
       count(DISTINCT subscriber_id) AS distinct_count,
       count(subscriber_id) AS total_count
FROM stats.campaign_logs
WHERE DOMAIN = 'xxxx' AND
      campaign_id='1234' AND
      log_type = 'EMAIL_SENT' AND
      log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30')
GROUP BY time;

This should use the index optimally. If this goes fast, then use union all to bring the rows together. Ugly, but sometimes union all is much faster than OR/IN because of index optimizations.

like image 89
Gordon Linoff Avatar answered Nov 15 '22 06:11

Gordon Linoff