Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I aggregate by the top N categories with an "all others" and totals?

I have tables that list a user's sales by Categories (each sale has at least one and may have several categories).

I can get the top categories for a user, but I need statistics for a user by both his/her top N categories and the remainder.

I've boiled the problem down to an MCVE as follows...

MCVE Data Summary:

Salesman    SaleID    Amount    Categories
--------    ------    ------    ------------------------------
     1         1         2      Service
     2         2         2      Software, Support_Contract
     2         3         3      Service
     2         4         1      Parts, Service, Software
     2         5         3      Support_Contract
     2         6         4      Promo_Gift, Support_Contract
     2         7        -2      Rebate, Support_Contract
     3         8         2      Software, Support_Contract
     3         9         3      Service
     3        10         1      Parts, Software
     3        11         3      Support_Contract
     3        12         4      Promo_Gift, Support_Contract
     3        13        -2      Rebate, Support_Contract

MCVE setup SQL:

CREATE TABLE Sales      ([Salesman] int, [SaleID] int, [Amount] int);
CREATE TABLE SalesTags  ([SaleID] int, [TagId] int);
CREATE TABLE Tags       ([TagId] int, [TagName] varchar(100) );

INSERT INTO Sales
    ([Salesman], [SaleID], [Amount])
VALUES
    (1, 1, 2),        (2, 6, 4),        (3, 10, 1),
    (2, 2, 2),        (2, 7, -2),       (3, 11, 3),
    (2, 3, 3),        (3, 8, 2),        (3, 12, 4),
    (2, 4, 1),        (3, 9, 3),        (3, 13, -2),
    (2, 5, 3)
;
INSERT INTO SalesTags
    ([SaleID], [TagId])
VALUES
    (1, 3),           (6, 4),           (10, 1),
    (2, 1),           (6, 5),           (10, 2),
    (2, 4),           (7, 4),           (11, 4),
    (3, 3),           (7, 6),           (12, 4),
    (4, 1),           (8, 1),           (12, 5),
    (4, 2),           (8, 4),           (13, 4),
    (4, 3),           (9, 3),           (13, 6),
    (5, 4)
;
INSERT INTO Tags
    ([TagId], [TagName])
VALUES
    (1, 'Software'),
    (2, 'Parts'),
    (3, 'Service'),
    (4, 'Support_Contract'),
    (5, 'Promo_Gift'),
    (6, 'Rebate')
;


See this SQL Fiddle, I can get the user's top N tags like:

WITH usersSales AS (  -- actual base CTE is much more complex
    SELECT  s.SaleID
            , s.Amount
    FROM    Sales s
    WHERE   s.Salesman = 2
)
SELECT Top 3  -- N can be 3 to 10
            t.TagName
            , COUNT (us.SaleID)     AS tagSales
            , SUM (us.Amount)       AS tagAmount
FROM        usersSales us
INNER JOIN  SalesTags st    ON st.SaleID = us.SaleID
INNER JOIN  Tags t          ON t.TagId   = st.TagId
GROUP BY    t.TagName
ORDER BY    tagAmount DESC
            , tagSales DESC
            , t.TagName

-- which reveals the user's top categories to be:

  1. "Support_Contract"
  2. "Service"
  3. "Promo_Gift"

in that order, for user 2. (And Support_Contract, Promo_Gift, Software for user 3.)

But the Results Needed, for N=3, are:

  • User 2:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Service                4             2
    Promo Gift             0             0
    - All Others -         0             0
    ============================================
    Totals                11             6
    
  • User 3:

    Top Category        Amount    Number of Sales
    ----------------    ------    ---------------
    Support Contract       7             4
    Promo_Gift             0             0
    Software               1             1
    - All Others -         3             1
    ============================================
    Totals                11             6
    

Where:

  1. Top Category is the user's highest ranked category (per the query above), for a given sale.
  2. The Top Category of row 2 excludes sales already accounted for in row 1.
  3. The Top Category of row 3 excludes sales already accounted for in rows 1 and 2.
  4. Etc.
  5. All leftover sales, not accounted for in the top N categories are lumped into the - All Others - group.
  6. The totals at the bottom match the user's overall sales figures.

How do I aggregate results like that?

Note that this is running on MS SQL-Server 2017 and I can't change the table schema.

like image 431
Brock Adams Avatar asked Jul 10 '18 03:07

Brock Adams


1 Answers

Here is one way to do it. Run the query step-by-step, CTE-by-CTE and examine intermediate results to understand how it works.

It is not the most efficient method, because I ended up joining the table to itself to eliminate sales that were summed up before, but I can't figure out how to avoid it at the moment.

WITH usersSales 
AS 
(  -- actual base CTE is much more complex
    SELECT
        s.SaleID
        , s.Amount
    FROM Sales s
    WHERE s.Salesman = 2
)
,CTE_Sums
AS
(
    SELECT
        t.TagName
        ,us.Amount
        ,us.SaleID
        ,SUM(us.Amount) OVER (PARTITION BY t.TagName) AS TagAmount
        ,COUNT(*) OVER (PARTITION BY t.TagName) AS TagSales
    FROM
        usersSales us
        INNER JOIN SalesTags st ON st.SaleID = us.SaleID
        INNER JOIN Tags t ON t.TagId = st.TagId
)
,CTE_Rank
AS
(
    SELECT
        TagName
        ,Amount
        ,SaleID
        ,TagAmount
        ,TagSales
        ,DENSE_RANK() OVER (ORDER BY TagAmount DESC, TagSales DESC, TagName) AS rnk
    FROM CTE_Sums
)
,CTE_Final
AS
(
    SELECT
        Main.TagName
        ,Main.Amount
        ,Main.SaleID
        ,Main.TagAmount
        ,Main.TagSales
        ,Main.rnk
        ,ISNULL(A.FinalTagAmount, 0) AS FinalTagAmount
        ,A.FinalTagSales
    FROM
        CTE_Rank AS Main
        OUTER APPLY
        (
            SELECT
                SUM(Detail.Amount) AS FinalTagAmount
                ,COUNT(*) AS FinalTagSales
            FROM CTE_Rank AS Detail
            WHERE
                Detail.rnk = Main.rnk
                AND Detail.SaleID NOT IN
                (
                    SELECT PrevRanks.SaleID
                    FROM CTE_Rank AS PrevRanks
                    WHERE PrevRanks.rnk < Detail.rnk
                )
        ) AS A
)
SELECT
    TagName
    ,MIN(FinalTagAmount) AS FinalTagAmount
    ,MIN(FinalTagSales) AS FinalTagSales
    ,rnk
    ,0 AS SortOrder
FROM CTE_Final
WHERE rnk <= 3
GROUP BY
    TagName
    ,rnk

UNION ALL

SELECT
    '- All Others -' AS TagName
    ,SUM(FinalTagAmount) AS FinalTagAmount
    ,SUM(FinalTagSales) AS FinalTagSales
    ,0 AS rnk
    ,1 AS SortOrder
FROM CTE_Final
WHERE rnk > 3

ORDER BY
    SortOrder
    ,rnk
;

CTE_Rank

Do not group and sum up rows yet, but use window aggregates to get the rank of each tag. We'll need individual rows (SaleID) with individual amounts later for filtering those that are in use.

+------------------+--------+--------+-----------+----------+-----+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk |
+------------------+--------+--------+-----------+----------+-----+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |
| Service          |      1 |      4 |         4 |        2 |   2 |
| Service          |      3 |      3 |         4 |        2 |   2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |
| Software         |      1 |      4 |         3 |        2 |   4 |
| Software         |      2 |      2 |         3 |        2 |   4 |
| Parts            |      1 |      4 |         1 |        1 |   5 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |
+------------------+--------+--------+-----------+----------+-----+

CTE_Final

OUTER APPLY does the main calculation with filtering of those sales that were encountered in the higher ranked tags.

+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
|     TagName      | Amount | SaleID | TagAmount | TagSales | rnk | FinalTagAmount | FinalTagSales |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+
| Support Contract |     -2 |      7 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      3 |      5 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      4 |      6 |         7 |        4 |   1 |              7 |             4 |
| Support Contract |      2 |      2 |         7 |        4 |   1 |              7 |             4 |
| Service          |      1 |      4 |         4 |        2 |   2 |              4 |             2 |
| Service          |      3 |      3 |         4 |        2 |   2 |              4 |             2 |
| Promo Gift       |      4 |      6 |         4 |        1 |   3 |              0 |             0 |
| Software         |      1 |      4 |         3 |        2 |   4 |              0 |             0 |
| Software         |      2 |      2 |         3 |        2 |   4 |              0 |             0 |
| Parts            |      1 |      4 |         1 |        1 |   5 |              0 |             0 |
| Rebate           |     -2 |      7 |        -2 |        1 |   6 |              0 |             0 |
+------------------+--------+--------+-----------+----------+-----+----------------+---------------+

Query Result

Simply put together top 3 ranked tags plus all the rest.

+------------------+----------------+---------------+-----+-----------+
|     TagName      | FinalTagAmount | FinalTagSales | rnk | SortOrder |
+------------------+----------------+---------------+-----+-----------+
| Support Contract |              7 |             4 |   1 |         0 |
| Service          |              4 |             2 |   2 |         0 |
| Promo Gift       |              0 |             0 |   3 |         0 |
| - All Others -   |              0 |             0 |   0 |         1 |
+------------------+----------------+---------------+-----+-----------+
like image 153
Vladimir Baranov Avatar answered Nov 02 '22 15:11

Vladimir Baranov