SQL

Question

I currently use the following query which takes about 8 minute to return the result due to the volume of data (About 14 months). is there a way I can speed this up please?

The database in question is MySQL with InnoDb engine

select
    CUSTOMER as CUST,
    SUM(IF(PAGE_TYPE = 'C',PAGE_TYPE_COUNT,0)) AS TOTAL_C,
    SUM(IF(PAGE_TYPE = 'D',PAGE_TYPE_COUNT,0)) AS TOTAL_D
from
        PAGE_HITS
where
    EVE_DATE >= '2016-01-01' and EVE_DATE <= '2016-01-05'
    and SITE =  'P'
    and SITE_SERV like 'serv1X%'
group by
    CUST

Data is partitioned by 6 months. Every column that goes into where clause is indexed. There a quite some indexes & would be a big list to list here. Hence, just summarizing in words. With respect to this query, EVE_DATE + PAGE_TYPE_COUNT is one of the composite indexes & so are CUST + SITE_SERV + EVE_DATE, EVE_DATE + SITE_SERV, EVE_DATE + SITE,

The primary key is actually a dummy auto increment number. It isn't used to be honest. I don't have access to the explain plan. I'll see what best I can do for this.

I'd appreciate any help to improve this one please.

xpy · Accepted Answer

I don't have the data so I can't test the speed of this but I think it would be faster.

select
    CUSTOMER as CUST,
    SUM(PAGE_TYPE_COUNT * (PAGE_TYPE = 'C')) AS TOTAL_C,
    SUM(PAGE_TYPE_COUNT * (PAGE_TYPE = 'D')) AS TOTAL_D
from
        PAGE_HITS
where
    EVE_DATE >= '2016-01-01' and EVE_DATE <= '2016-01-05'
    and SITE =  'P'
    and SITE_SERV like 'serv1X%'
group by
    CUST

It worked just fine on my fiddle on MySql 5.6

Thorsten Kettner · Answer

Okay, as the table range partition is on EVE_DATE, the DBMS should easily see which partition to read. So it's all about what index to use then.

There is one column you check for equality (SITE = 'P'). This should come first in your index. You can then add EVE_DATE and SITE_SERV in whatever order I guess. Thus your index should be able to locate the table records in question as fast as possible.

If, however, you add the other fields used in your query to your index, the table wouldn't even have to be read, because all data would be avaliable in the index itself:

create index on page_hits(site, eve_date, site_serv, customer, page_type, page_type_count);

This should be the optimal index for your query if I am not mistaken.

Preuk · Answer

The main optimization factor would be indexes. One should match your query as closely as possible, for example :

EVE_DATE, SITE, CUST, SITE_SERV

The order is important, at least for SITE_SERV as the last value; as you use LIKE on it you won't be using full value, which would degrade index efficiency for next columns.

You might also gain a tiny bit by removing the IF and returning type and count; maybe you can process/format this value in your front application?

Anyway, you should start with profiling your current query using EXPLAIN to see what goes wrong. If you can't, you could try reproducing structure, indexes and a few dummy data on a local DB, volume is irrelevant there.

Rick James · Answer

Add these two indexes:

INDEX(site, date)
INDEX(site, site_serv)

The optimizer will look at the statistics and pick between them. Roughly speaking the first one would be better if there are fewer rows with 'P' & DATEs in that range, than 'P' & 'serv1X%'.

Yes, the "covering" index that Thorsten might be better, but it has more fields than I like to put in an index.

PARTITIONing might help. But there is too little info to say for sure. The reason partitioning might help is that you have a "2-dimensional" lookup -- a date range and 'serv1X%'. You would need to partition either on date or site_serv, then have the PRIMARY KEY(site, ..., ...) with the other of (date or site_serv) as the second column. The rest of the columns would need to include both the partition key and some column to make it unique. That gets so messy that I don't want to think it through.

SQL - speed up query

Tags:

mysql

query-optimization

usert4jju7

4 Answers

xpy

Thorsten Kettner

Preuk

Rick James

Recent Activity

Donate For Us