This is my table in MySql 5.5 having 30 million records
CREATE TABLE `campaign_logs` (
`domain` varchar(50) DEFAULT NULL,
`campaign_id` varchar(50) DEFAULT NULL,
`subscriber_id` varchar(50) DEFAULT NULL,
`message` varchar(21000) DEFAULT NULL,
`log_time` datetime DEFAULT NULL,
`log_type` varchar(50) DEFAULT NULL,
`level` varchar(50) DEFAULT NULL,
`campaign_name` varchar(500) DEFAULT NULL,
KEY `subscriber_id_index` (`subscriber_id`),
KEY `log_type_index` (`log_type`),
KEY `log_time_index` (`log_time`),
KEY `campid_domain_logtype_logtime_subid_index` (`campaign_id`,`domain`,`log_type`,`log_time`,`subscriber_id`),
KEY `domain_logtype_logtime_index` (`domain`,`log_type`,`log_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
In the following query, I'm doing group by hour with respect to timezone
QUERY
SELECT
log_type
,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
,count(*) AS total
,count(DISTINCT subscriber_id) d
FROM
stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
DOMAIN='xxx'
AND campaign_id='123'
AND log_type = 'EMAIL_OPENED'
AND log_time BETWEEN
CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date
UNION ALL
SELECT
log_type
,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
,count(*) AS total
,count(DISTINCT subscriber_id) d
FROM
stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
DOMAIN='xxx'
AND campaign_id='123'
AND log_type = 'EMAIL_SENT'
AND log_time BETWEEN
CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date
UNION ALL
SELECT
log_type
,DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS log_date
,count(*) AS total
,count(DISTINCT subscriber_id) d
FROM
stats.campaign_logs USE INDEX(campid_domain_logtype_logtime_subid_index)
WHERE
DOMAIN='xxx'
AND campaign_id='123'
AND log_type = 'EMAIL_CLICKED'
AND log_time BETWEEN
CONVERT_TZ('2015-02-01 00:00:00','+00:00','+05:30') AND
CONVERT_TZ('2015-03-01 23:59:58','+00:00','+05:30')
GROUP BY log_date;
RESULTS
The above query will give results like this
+---------------+-------+----------------+-------------+
| EMAIL_CLICKED | 1 AM | 71 | 83 |
| EMAIL_CLICKED | 1 PM | 25 | 27 |
| EMAIL_SENT | 10 AM | 51 | 59 |
| EMAIL_OPENED | 10 PM | 16 | 18 |
This is the explain of above query
EXPLAIN
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
| 1 | PRIMARY | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468 | NULL | 55074 | Using where; Using index; Using filesort |
| 2 | UNION | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468 | NULL | 330578 | Using where; Using index; Using filesort |
| 3 | UNION | campaign_logs | range | campid_domain_logtype_logtime_subid_index | campid_domain_logtype_logtime_subid_index | 468 | NULL | 1589 | Using where; Using index; Using filesort |
|NULL| UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+---------------+-------+-------------------------------------------+-------------------------------------------+---------+------+--------+------------------------------------------+
OPTIMIZATION ?
We have a covering index on this table.
This query is taking long time (more than 1 minute).
If I remove the distinct_count(subscriber_id)
from the query, then we are getting results in 1.5 sec, but I need distinct_count
of subscriber_id
from the query.
Is there any way to optimize this query ?
Thanks
You are not processing a tremendous amount of data, so the group by
should not be taking 40 seconds -- assuming that you are not on a really busy server with lots of lock activity on the table.
Try this version of the query (limited to one log_type
):
SELECT log_type,
DATE_FORMAT(CONVERT_TZ(log_time,'+00:00','+05:30'),'%l %p') AS time,
count(DISTINCT subscriber_id) AS distinct_count,
count(subscriber_id) AS total_count
FROM stats.campaign_logs
WHERE DOMAIN = 'xxxx' AND
campaign_id='1234' AND
log_type = 'EMAIL_SENT' AND
log_time BETWEEN CONVERT_TZ('2015-02-07 00:00:00','+00:00','+05:30') AND CONVERT_TZ('2015-02-14 23:59:58','+00:00','+05:30')
GROUP BY time;
This should use the index optimally. If this goes fast, then use union all
to bring the rows together. Ugly, but sometimes union all
is much faster than OR
/IN
because of index optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With