Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use percentile_cont with a "group by" statment in T-SQL

I'd like to use the percentile_cont function to get median values in T-SQL. However, I also need to get mean values as well. I'd like to do something like the following:

SELECT  CustomerID ,
    AVG(Expenditure) AS MeanSpend , percentile_cont
    ( .5) WITHIN GROUP(ORDER BY Expenditure) OVER( ) AS MedianSpend
FROM    Customers
GROUP BY CustomerID

Can this be accomplished? I know I can use the OVER clause to group the percentile_cont results...

but then I'm stuck using two queries, am I not?

like image 839
John Chrysostom Avatar asked Oct 23 '13 15:10

John Chrysostom


People also ask

What does PERCENTILE_CONT do in SQL?

The definition from Microsoft for PERCENTILE_CONT is: “Calculates a percentile based on a continuous distribution of the column value in SQL Server. The result is interpolated and might not be equal to any of the specific values in the column.”

What is the difference between PERCENTILE_CONT and PERCENTILE_DISC?

PERCENTILE_CONT returns a computed result after doing linear interpolation. PERCENTILE_DISC simply returns a value from the set of values that are aggregated over.

What is Percent Rank in SQL?

The PERCENT_RANK function computes the rank of the employee's salary within a department as a percentage. The PARTITION BY clause is specified to partition the rows in the result set by department. The ORDER BY clause in the OVER clause orders the rows in each partition.


2 Answers

Just figured it out... gotta drop the group by and give both aggregation functions a over statement.

SELECT CustomerID,
    AVG(Expenditure) OVER(PARTITION BY CustomerID) AS MeanSpend,
    percentile_cont(.5) WITHIN GROUP(ORDER BY Expenditure) OVER(PARTITION BY CustomerID) AS MedianSpend
FROM Customers
like image 82
John Chrysostom Avatar answered Oct 30 '22 23:10

John Chrysostom


You can't use "group by" with window functions. These functions return the aggregated values for every row. One way is to use "select distinct" to get rid of the duplicate rows. Just make sure you partition each window function by the non-aggregated columns (groupId in this example).

--Generate test data
SELECT  TOP(10) 
    value.number%3  AS  groupId
,   value.number    AS  number
INTO    #data
FROM  master.dbo.spt_values  AS  value
WHERE value."type" = 'P' 
ORDER BY NEWID()
;

--View test data
SELECT  * FROM #data ORDER BY groupId,number;

--CALCULATE MEDIAN
SELECT DISTINCT 
    groupId
,   AVG(number)                                         OVER(PARTITION BY groupId)  AS mean
,   percentile_cont(.5) WITHIN GROUP(ORDER BY number)   OVER(PARTITION BY groupId)  AS median
FROM    #data
;

--Clean up
DROP TABLE #data;
like image 37
jumxozizi Avatar answered Oct 30 '22 21:10

jumxozizi