Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way of summarizing sales based on today, this week, this month, this quarter this year?

Tags:

The following query returns a correct result but how do I get the same result faster?

The goal is to output a table for tracking sellers progress by summarizing their sales today, this week, month and quarter.

SellerID    Today                 ThisWeek              ThisMonth             ThisQuarter
----------- --------------------- --------------------- --------------------- ---------------------
1           400,00                700,00                900,00                900,00
2           950,00                1850,00               2650,00               2650,00

My query:

CREATE TABLE #sales(
    [Price] MONEY,
    [Date] DATE,
    [SellerID] INT
)

INSERT INTO #sales VALUES 
(100, '2012-01-01', 1),
(200, '2012-04-01',1),
(300, '2012-04-23',1),
(400, '2012-04-27',1),
(700, '2012-01-01', 2),
(700, '2012-01-02', 2),
(800, '2012-04-01',2),
(900, '2012-04-23',2),
(950, '2012-04-27',2)


SELECT 
SellerID AS SellerID,

SUM(CASE WHEN [Date] >= DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()),0) THEN [Price] END) AS Today,
SUM(CASE WHEN [Date] >= DATEADD(WEEK, DATEDIFF(WEEK, 0, GETDATE()), 0) THEN [Price] END) AS ThisWeek,
SUM(CASE WHEN [Date] >= DATEADD(MONTH, DATEDIFF(MONTH, 0, GETDATE()), 0) THEN [Price] END) AS ThisMonth,
SUM(CASE WHEN [Date] >= DATEADD(QUARTER, DATEDIFF(QUARTER, 0, GETDATE()), 0) THEN [Price] END) AS ThisQuarter

FROM #sales
WHERE DATEPART(YEAR, [Date]) = DATEPART(YEAR, GETDATE()) 
GROUP BY SellerID

When executing the same query on a larger table this gets quite slow. Just removing the CASE-statements cuts the execution time by almost 50%.

How can I achieve the same result in a faster and more efficient way?

like image 444
Jonas Stensved Avatar asked Apr 27 '12 14:04

Jonas Stensved


1 Answers

Since it is Friday afternoon, I thought I'd expand on my comment regarding warehousing. even if you cannot fully explore cubes with SSAS or any other OLAP you can still do your own report specific warehousing. In your case I would set up a new Database (I always call mine DW but the world is your oyster), and create 2 schemas Fact and Dim (representing facts and dimensions). In your case it would need 2 tables, although you may want to add another dimension for "SellerID" depending on if this needs further reporting on.

CREATE TABLE Dim.Date
(       DateKey     DATE NOT NULL,
        DayOfWeek   VARCHAR(20) NOT NULL,
        Day         TINYINT NOT NULL,
        Week        TINYINT NOT NULL,
        Quarter     TINYINT NOT NULL,
        Month       TINYINT NOT NULL,
        Year        SMALLINT NOT NULL
    CONSTRAINT PK_Dim_Date_DateKey PRIMARY KEY (DateKey)
)
CREATE TABLE Fact.Sales
(       DateKey     DATE NOT NULL,
        SellerID    INT NOT NULL,
        Sales       INT NOT NULL,
        Amount      MONEY NOT NULL,
    CONSTRAINT PK_Fact_Sales PRIMARY KEY (DateKey, SellerID),
    CONSTRAINT FK_Fact_Sales_DateKey FOREIGN KEY (DateKey) REFERENCES Dim.Date
)

Assuming the data will not get backdated you can use a procedure like this to fill your warehouse on a scheduled job:

DECLARE @MaxDate DATE
SELECT  @MaxDate = DATEADD(DAY, 1, MAX(DateKey))
FROM    Fact.Sales

INSERT INTO Dim.Date
SELECT  DATEADD(DAY, Increment, @MaxDate), 
        DATENAME(WEEKDAY, DATEADD(DAY, Increment, @MaxDate)), 
        DATEPART(DAY, DATEADD(DAY, Increment, @MaxDate)),
        DATEPART(WEEK, DATEADD(DAY, Increment, @MaxDate)),
        DATEPART(MONTH, DATEADD(DAY, Increment, @MaxDate)),
        DATEPART(QUARTER, DATEADD(DAY, Increment, @MaxDate)),
        DATEPART(YEAR, DATEADD(DAY, Increment, @MaxDate))
FROM    (   SELECT  ROW_NUMBER() OVER(ORDER BY Object_ID) - 1 [Increment]
            FROM    Sys.Objects
        ) obj
WHERE   NOT EXISTS
        (   SELECT  1
            FROM    Dim.Date
            WHERE   Date.DateKey = DATEADD(DAY, Increment, @MaxDate)
        )


INSERT INTO Fact.Sales
SELECT  [Date], SellerID, COUNT(*), SUM(Price)
FROM    LiveDatabase..Sales
WHERE   [Date] >= @MaxDate
GROUP BY [Date], SellerID

This would leave you with the following query to produce your report

SELECT  SellerID,
        SUM(CASE WHEN Today.DateKey = Date.DateKey THEN Amount ELSE O END) [Today],
        SUM(CASE WHEN Today.Week = Date.Week THEN Amount ELSE O END) [ThisWeek],
        SUM(CASE WHEN Today.Month = Date.Month THEN Amount ELSE O END) [ThisMonth],
        SUM(CASE WHEN Today.Quarter = Date.Quarter THEN Amount ELSE O END) [ThisQuarter],
        SUM(CASE WHEN Today.Year = Date.Year THEN Amount ELSE O END) [ThisYear]
FROM    Fact.Sales
        INNER JOIN Dim.Date
            ON Date.DateKey = Sales.DateKey
        INNER JOIN Dim.Date Today
            ON Today.DateKey = CAST(GETDATE() AS DATE)
            AND Today.Year = Date.Year
GROUP BY SellerID

It looks, if anything, more complicated than the original query, but the more the online database grows the more you will see the benefit. I've done an SQL Fiddle to demonstrate the advantages, it fills the live data with 10000 random sales records, then creates a warehouse (It may take a few seconds to build the schema). You should notice the execution time of the query on the warehouse is significantly faster (c.20x). It may not be 20x faster on the first run, but once the query plan has been cached for both queries the warehouse query is consistently 20x faster (has been for me anyway).

like image 69
GarethD Avatar answered Sep 20 '22 23:09

GarethD