Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL: Unable to select the records from specific partitions?

I am working with MySQL 5.6. I had created a table with 366 partitions to save data daywise means In a year we have maximum 366 days so I had created 366 partitions on that table. The hash partitions were managed by an integer column which stores 1 to 366 for each record.

Report_Summary Table:

CREATE TABLE `Report_Summary` (
  `PartitionsID` int(4) unsigned NOT NULL,
  `ReportTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `Amount` int(10) NOT NULL,
  UNIQUE KEY `UNIQUE` (`PartitionsID`,`ReportTime`),
  KEY `PartitionsID` (`PartitionsID`),
  KEY `ReportTime` (`ReportTime`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED
/*!50100 PARTITION BY HASH (PartitionsID)
PARTITIONS 366 */

My current query:

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2014-12-30 23:59:59' AND 
      RS.PartitionsID BETWEEN DAYOFYEAR('2014-12-26 00:00:00') AND DAYOFYEAR('2014-12-30 23:59:59')
GROUP BY ReportDate; 

The above query is perfectly working and using partitions p360 to p364 to fetch the data. Now the problem is when I pass the fromDate to '2014-12-26' and toDate to '2015-01-01' Then above query won't work. Because the Day of year for '2015-01-01' is 1 so my conditions got failed.

Now I had tried for passing the value in IN operator then it works perfectly in database check below query:

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND 
      RS.PartitionsID IN (360,361,362,363,364,365,1)
GROUP BY ReportDate; 

To generate above scenario I had created a function and passed two dates and generate a comma seperated string of the IDs

SELECT GenerateRange('2014-12-26 00:00:00', '2015-01-01 23:59:59');

Which reurns me data as:

'360,361,362,363,364,365,366,1'

And I tried to use that function in my query so I had changed my query as below:

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= '2014-12-26 00:00:00' AND RS.ReportTime <= '2015-01-01 23:59:59' AND 
      FIND_IN_SET(RS.PartitionsID, GenerateRange('2014-12-26 00:00:00', '2015-01-01 00:00:00'))
GROUP BY ReportDate; 

Then I had checked the execution plan of above query using EXPLAIN PARTITION SELECT.... And I found thet my condition won't work. It uses all partitions to fetch data. I want to use the specific partitions of those dates only. It must be check only these 360,361,362,363,364,365,366,1 partitions means p360 to p366 and p1.

Why my query is not working? And this is not right way to implement this then I want solution How can I achieve this?

I know from coding I can implement this but I have to write a query to implement this.

Thanks...

like image 414
Saharsh Shah Avatar asked Nov 10 '22 21:11

Saharsh Shah


1 Answers

There are a few options that I can think of.

  1. Create case statements that cover multi-year search criteria.
  2. Create a CalendarDays table and use it to get the distinct list of DayOfYear for your in clause.
  3. Variation of option 1 but using a union to search each range separately

Option 1: Using case statements. It is not pretty, but seems to work. There is a scenario where this option could search one extra partition, 366, if the query spans years in a non-leap year. Also I'm not certain that the optimizer will like the OR in the RS.ParitionsID filter, but you can try it out.

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
    AND 
    (
    RS.PartitionsID BETWEEN 
        CASE 
            WHEN
                --more than one year, search all days 
                year(@endDate) - year(@startDate) > 1
                --one full year difference 
                OR year(@endDate) - year(@startDate) = 1 
                    AND DAYOFYEAR(@startDate) <= DAYOFYEAR(@endDate)
            THEN 1
            ELSE DAYOFYEAR(@startDate)
        END
        and 
        CASE
            WHEN 
                --query spans the end of a year
                year(@endDate) - year(@startDate) >= 1
            THEN 366
            ELSE DAYOFYEAR(@endDate)
        END
    --Additional query to search less than portion of next year
    OR RS.PartitionsID <=
        CASE
            WHEN year(@endDate) - year(@startDate) > 1
                OR DAYOFYEAR(@startDate) > DAYOFYEAR(@endDate)
            THEN DAYOFYEAR(@endDate)
            ELSE NULL
        END
    )
GROUP BY ReportDate;

Option 2: Using CalendarDays table. This option is much cleaner. The downside is you will need to create a new CalendarDays table if you do not have one.

SELECT DATE(RS.ReportTime) AS ReportDate, SUM(RS.Amount) AS Total
FROM Report_Summary RS
WHERE RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
    AND RS.PartitionsID IN
    (
        SELECT DISTINCT DAYOFYEAR(c.calDate) 
        FROM dbo.calendarDays c
        WHERE c.calDate >= @startDate and c.calDate <= @endDate
    )

EDIT: Option 3: variation of option 1, but using Union All to search each range separately. The idea here is that since there is not an OR in the statement, that the optimizer will be able to apply the partition pruning. Note: I do not normally work in MySQL, so my syntax may be a little off, but the general idea is there.

DECLARE @startDate datetime, @endDate datetime;
DECLARE @rangeOneStart datetime, @rangeOneEnd datetime, @rangeTwoStart datetime, @rangeTwoEnd datetime;

SELECT @rangeOneStart := 
        CASE 
            WHEN
                --more than one year, search all days 
                year(@endDate) - year(@startDate) > 1
                --one full year difference 
                OR year(@endDate) - year(@startDate) = 1 
                    AND DAYOFYEAR(@startDate) <= DAYOFYEAR(@endDate)
            THEN 1
            ELSE DAYOFYEAR(@startDate)
        END
    , @rangeOneEnd := 
        CASE
            WHEN 
                --query spans the end of a year
                year(@endDate) - year(@startDate) >= 1
            THEN 366
            ELSE DAYOFYEAR(@endDate)
        END 
    , @rangeTwoStart := 1
    , @rangeTwoEnd := 
        CASE
            WHEN year(@endDate) - year(@startDate) > 1
                OR DAYOFYEAR(@startDate) > DAYOFYEAR(@endDate)
            THEN DAYOFYEAR(@endDate)
            ELSE NULL
        END
;

SELECT t.ReportDate, sum(t.Amount) as Total
FROM 
(
    SELECT DATE(RS.ReportTime) AS ReportDate, RS.Amount
    FROM Report_Summary RS
    WHERE RS.PartitionsID BETWEEN @rangeOneStart AND @rangeOneEnd
        AND RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate

    UNION ALL

    SELECT DATE(RS.ReportTime) AS ReportDate, RS.Amount
    FROM Report_Summary RS
    WHERE RS.PartitionsID BETWEEN @rangeTwoStart AND @rangeTwoEnd
        AND @rangeTwoEnd IS NOT NULL
        AND RS.ReportTime >= @startDate AND RS.ReportTime <= @endDate
) t
GROUP BY ReportDate;
like image 120
BateTech Avatar answered Nov 14 '22 22:11

BateTech