Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Server: A Grouping question that's annoying me

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:

Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.

What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.

I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

Thanks y'all! -Ricky

like image 463
user366729 Avatar asked Dec 13 '22 21:12

user366729


2 Answers

This gives you an answer in a slightly different format than you requested:

SELECT DISTINCT
    T1.PercentOfTotal AS Alice,
    T2.PercentOfTotal AS Barbara,
    T3.PercentOfTotal AS Claire
FROM employees_paid T1
JOIN employees_paid T2
  ON T1.Month = T2.Month AND T1.Employee = 'Alice' AND T2.Employee = 'Barbara'
JOIN employees_paid T3
  ON T2.Month = T3.Month AND T3.Employee = 'Claire'

Result:

Alice   Barbara  Claire
25%     50%      25%
25%     65%      10%

If you want to, you can use UNPIVOT to turn this result set into the form you asked for.

SELECT rn AS ID, Employee, PercentOfTotal
FROM (
    SELECT *, ROW_NUMBER() OVER (ORDER BY Alice) AS rn
    FROM (
        SELECT DISTINCT
            T1.PercentOfTotal AS Alice,
            T2.PercentOfTotal AS Barbara,
            T3.PercentOfTotal AS Claire
        FROM employees_paid T1
        JOIN employees_paid T2 ON T1.Month = T2.Month AND T1.Employee = 'Alice'
                                                      AND T2.Employee = 'Barbara'
        JOIN employees_paid T3 ON T2.Month = T3.Month AND T3.Employee = 'Claire'
    ) T1
) p UNPIVOT (PercentOfTotal FOR Employee IN (Alice, Barbara, Claire)) AS unpvt

Result:

ID  Employee  PercentOfTotal  
1   Alice     25%
1   Barbara   50%      
1   Claire    25%             
2   Alice     25%             
2   Barbara   65%              
2   Claire    10%               
like image 175
Mark Byers Avatar answered Jan 05 '23 11:01

Mark Byers


What you want is for each month's distribution to act as a signature or pattern of values which you would then want to find in other months. What is not clear is whether the employee to whom the value went is as important as the break down of percentages. For example, would Alice=65%, Barbara=25%, Claire=10% be the same as the Month 3 in your example? In my example, I presumed that it would not be the same. Similar to Martin Smith's solution, I find the signatures by multiplying each percentage by 10. This presumes that all percentage values are less than one. If someone could have a percentage of 110% for example, that would create problems for this solution.

With Employees As
    (
    Select 1 As Month, 'Alice' As Employee, .25 As PercentOfTotal
    Union All Select 1, 'Barbara', .65
    Union All Select 1, 'Claire', .10
    Union All Select 2, 'Alice', .25
    Union All Select 2, 'Barbara', .50
    Union All Select 2, 'Claire', .25
    Union All Select 3, 'Alice', .25
    Union All Select 3, 'Barbara', .65
    Union All Select 3, 'Claire', .10
    )
    , EmployeeRanks As
    (
    Select Month, Employee, PercentOfTotal
        , Row_Number() Over ( Partition By Month Order By Employee, PercentOfTotal ) As ItemRank
    From Employees
    )
    , Signatures As
    (
    Select Month
        , Sum( PercentOfTotal * Cast( Power( 10, ItemRank ) As bigint) ) As SignatureValue
    From EmployeeRanks
    Group By Month
    )
    , DistinctSignatures As
    (
    Select Min(Month) As MinMonth, SignatureValue
    From Signatures
    Group By SignatureValue
    )
Select E.Month, E.Employee, E.PercentOfTotal
From Employees As E
    Join DistinctSignatures As D
        On D.MinMonth = E.Month
like image 30
Thomas Avatar answered Jan 05 '23 12:01

Thomas