I currently use the following query which takes about 8 minute to return the result due to the volume of data (About 14 months). is there a way I can speed this up please?
The database in question is MySQL with InnoDb engine
select
CUSTOMER as CUST,
SUM(IF(PAGE_TYPE = 'C',PAGE_TYPE_COUNT,0)) AS TOTAL_C,
SUM(IF(PAGE_TYPE = 'D',PAGE_TYPE_COUNT,0)) AS TOTAL_D
from
PAGE_HITS
where
EVE_DATE >= '2016-01-01' and EVE_DATE <= '2016-01-05'
and SITE = 'P'
and SITE_SERV like 'serv1X%'
group by
CUST
Data is partitioned by 6 months. Every column that goes into where clause is indexed. There a quite some indexes & would be a big list to list here. Hence, just summarizing in words. With respect to this query, EVE_DATE + PAGE_TYPE_COUNT
is one of the composite indexes & so are CUST + SITE_SERV + EVE_DATE
, EVE_DATE + SITE_SERV
, EVE_DATE + SITE
,
The primary key is actually a dummy auto increment number. It isn't used to be honest. I don't have access to the explain plan. I'll see what best I can do for this.
I'd appreciate any help to improve this one please.
I don't have the data so I can't test the speed of this but I think it would be faster.
select
CUSTOMER as CUST,
SUM(PAGE_TYPE_COUNT * (PAGE_TYPE = 'C')) AS TOTAL_C,
SUM(PAGE_TYPE_COUNT * (PAGE_TYPE = 'D')) AS TOTAL_D
from
PAGE_HITS
where
EVE_DATE >= '2016-01-01' and EVE_DATE <= '2016-01-05'
and SITE = 'P'
and SITE_SERV like 'serv1X%'
group by
CUST
It worked just fine on my fiddle on MySql 5.6
Okay, as the table range partition is on EVE_DATE, the DBMS should easily see which partition to read. So it's all about what index to use then.
There is one column you check for equality (SITE = 'P'
). This should come first in your index. You can then add EVE_DATE
and SITE_SERV
in whatever order I guess. Thus your index should be able to locate the table records in question as fast as possible.
If, however, you add the other fields used in your query to your index, the table wouldn't even have to be read, because all data would be avaliable in the index itself:
create index on page_hits(site, eve_date, site_serv, customer, page_type, page_type_count);
This should be the optimal index for your query if I am not mistaken.
The main optimization factor would be indexes. One should match your query as closely as possible, for example :
EVE_DATE, SITE, CUST, SITE_SERV
The order is important, at least for SITE_SERV as the last value; as you use LIKE
on it you won't be using full value, which would degrade index efficiency for next columns.
You might also gain a tiny bit by removing the IF
and returning type and count; maybe you can process/format this value in your front application?
Anyway, you should start with profiling your current query using EXPLAIN
to see what goes wrong. If you can't, you could try reproducing structure, indexes and a few dummy data on a local DB, volume is irrelevant there.
Add these two indexes:
INDEX(site, date)
INDEX(site, site_serv)
The optimizer will look at the statistics and pick between them. Roughly speaking the first one would be better if there are fewer rows with 'P' & DATEs in that range, than 'P' & 'serv1X%'.
Yes, the "covering" index that Thorsten might be better, but it has more fields than I like to put in an index.
PARTITIONing
might help. But there is too little info to say for sure. The reason partitioning might help is that you have a "2-dimensional" lookup -- a date range and 'serv1X%'. You would need to partition either on date or site_serv, then have the PRIMARY KEY(site, ..., ...)
with the other of (date or site_serv) as the second column. The rest of the columns would need to include both the partition key and some column to make it unique. That gets so messy that I don't want to think it through.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With