Function to Calculate Median in SQL Server

People also ask

Is there a median function in SQL Server?

In SQL Server, there is no direct way yet to find the median of a range of numbers. However, developers used several work-arounds to get the median using several techniques like MIN MAX, AVG, ect… In SQL Server 2012, Microsoft introduced an analytic function PERCENTILE_CONT function.

How do you define median in SQL?

We calculate the median of the Distance from the demo table. SET @rowindex := -1; SELECT AVG(d. distance) as Median FROM (SELECT @rowindex:=@rowindex + 1 AS rowindex, demo.

How do you find the middle value in SQL?

SELECT @Median = AVG(1.0 * val) FROM ( SELECT o. val, rn = ROW_NUMBER() OVER (ORDER BY o.

How do you calculate mean median and mode in SQL Server?

Median is calculated by ordering all the data points from the series in ascending order and then picking out the middle data point from it. If there are two middle data points, then the median will be calculated as the mean of those two numbers.

If you're using SQL 2005 or better this is a nice, simple-ish median calculation for a single column in a table:

SELECT
(
 (SELECT MAX(Score) FROM
   (SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score) AS BottomHalf)
 +
 (SELECT MIN(Score) FROM
   (SELECT TOP 50 PERCENT Score FROM Posts ORDER BY Score DESC) AS TopHalf)
) / 2 AS Median

2019 UPDATE: In the 10 years since I wrote this answer, more solutions have been uncovered that may yield better results. Also, SQL Server releases since then (especially SQL 2012) have introduced new T-SQL features that can be used to calculate medians. SQL Server releases have also improved its query optimizer which may affect perf of various median solutions. Net-net, my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps. Take a look at this article from 2012 which is a great resource: https://sqlperformance.com/2012/08/t-sql-queries/median

This article found the following pattern to be much, much faster than all other alternatives, at least on the simple schema they tested. This solution was 373x faster (!!!) than the slowest (PERCENTILE_CONT) solution tested. Note that this trick requires two separate queries which may not be practical in all cases. It also requires SQL 2012 or later.

DECLARE @c BIGINT = (SELECT COUNT(*) FROM dbo.EvenRows);

SELECT AVG(1.0 * val)
FROM (
    SELECT val FROM dbo.EvenRows
     ORDER BY val
     OFFSET (@c - 1) / 2 ROWS
     FETCH NEXT 1 + (1 - @c % 2) ROWS ONLY
) AS x;

Of course, just because one test on one schema in 2012 yielded great results, your mileage may vary, especially if you're on SQL Server 2014 or later. If perf is important for your median calculation, I'd strongly suggest trying and perf-testing several of the options recommended in that article to make sure that you've found the best one for your schema.

I'd also be especially careful using the (new in SQL Server 2012) function PERCENTILE_CONT that's recommended in one of the other answers to this question, because the article linked above found this built-in function to be 373x slower than the fastest solution. It's possible that this disparity has been improved in the 7 years since, but personally I wouldn't use this function on a large table until I verified its performance vs. other solutions.

ORIGINAL 2009 POST IS BELOW:

There are lots of ways to do this, with dramatically varying performance. Here's one particularly well-optimized solution, from Medians, ROW_NUMBERs, and performance. This is a particularly optimal solution when it comes to actual I/Os generated during execution – it looks more costly than other solutions, but it is actually much faster.

That page also contains a discussion of other solutions and performance testing details. Note the use of a unique column as a disambiguator in case there are multiple rows with the same value of the median column.

As with all database performance scenarios, always try to test a solution out with real data on real hardware – you never know when a change to SQL Server's optimizer or a peculiarity in your environment will make a normally-speedy solution slower.

SELECT
   CustomerId,
   AVG(TotalDue)
FROM
(
   SELECT
      CustomerId,
      TotalDue,
      -- SalesOrderId in the ORDER BY is a disambiguator to break ties
      ROW_NUMBER() OVER (
         PARTITION BY CustomerId
         ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,
      ROW_NUMBER() OVER (
         PARTITION BY CustomerId
         ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc
   FROM Sales.SalesOrderHeader SOH
) x
WHERE
   RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CustomerId
ORDER BY CustomerId;

In SQL Server 2012 you should use PERCENTILE_CONT:

SELECT SalesOrderID, OrderQty,
    PERCENTILE_CONT(0.5) 
        WITHIN GROUP (ORDER BY OrderQty)
        OVER (PARTITION BY SalesOrderID) AS MedianCont
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC

See also : http://blog.sqlauthority.com/2011/11/20/sql-server-introduction-to-percentile_cont-analytic-functions-introduced-in-sql-server-2012/

My original quick answer was:

select  max(my_column) as [my_column], quartile
from    (select my_column, ntile(4) over (order by my_column) as [quartile]
         from   my_table) i
--where quartile = 2
group by quartile

This will give you the median and interquartile range in one fell swoop. If you really only want one row that is the median then uncomment the where clause.

When you stick that into an explain plan, 60% of the work is sorting the data which is unavoidable when calculating position dependent statistics like this.

I've amended the answer to follow the excellent suggestion from Robert Ševčík-Robajz in the comments below:

;with PartitionedData as
  (select my_column, ntile(10) over (order by my_column) as [percentile]
   from   my_table),
MinimaAndMaxima as
  (select  min(my_column) as [low], max(my_column) as [high], percentile
   from    PartitionedData
   group by percentile)
select
  case
    when b.percentile = 10 then cast(b.high as decimal(18,2))
    else cast((a.low + b.high)  as decimal(18,2)) / 2
  end as [value], --b.high, a.low,
  b.percentile
from    MinimaAndMaxima a
  join  MinimaAndMaxima b on (a.percentile -1 = b.percentile) or (a.percentile = 10 and b.percentile = 10)
--where b.percentile = 5

This should calculate the correct median and percentile values when you have an even number of data items. Again, uncomment the final where clause if you only want the median and not the entire percentile distribution.

Even better:

SELECT @Median = AVG(1.0 * val)
FROM
(
    SELECT o.val, rn = ROW_NUMBER() OVER (ORDER BY o.val), c.c
    FROM dbo.EvenRows AS o
    CROSS JOIN (SELECT c = COUNT(*) FROM dbo.EvenRows) AS c
) AS x
WHERE rn IN ((c + 1)/2, (c + 2)/2);

From the master Himself, Itzik Ben-Gan!

MS SQL Server 2012 (and later) has the PERCENTILE_DISC function which computes a specific percentile for sorted values. PERCENTILE_DISC (0.5) will compute the median - https://msdn.microsoft.com/en-us/library/hh231327.aspx

Simple, fast, accurate

SELECT x.Amount 
FROM   (SELECT amount, 
               Count(1) OVER (partition BY 'A')        AS TotalRows, 
               Row_number() OVER (ORDER BY Amount ASC) AS AmountOrder 
        FROM   facttransaction ft) x 
WHERE  x.AmountOrder = Round(x.TotalRows / 2.0, 0)

Related questions
                            
                                How to get script of SQL Server data? [duplicate]
                            
                                How to drop all tables in a SQL Server database?
                            
                                How can I determine installed SQL Server instances and their versions?
                            
                                How to write a foreach in SQL Server?
                            
                                Why use a READ UNCOMMITTED isolation level?
                            
                                What are the main performance differences between varchar and nvarchar SQL Server data types?
                            
                                How do I flush the PRINT buffer in TSQL?
                            
                                Find a string by searching all tables in SQL Server
                            
                                Select top 10 records for each category
                            
                                Exit single-user mode
                            
                                The ALTER TABLE statement conflicted with the FOREIGN KEY constraint
                            
                                How to get the connection String from a database
                            
                                SQL Server Management Studio, how to get execution time down to milliseconds
                            
                                When should I use semicolons in SQL Server?
                            
                                Add IIS 7 AppPool Identities as SQL Server Logons
                            
                                How to update Identity Column in SQL Server?
                            
                                Does MS SQL Server's "between" include the range boundaries?
                            
                                Convert Datetime column from UTC to local time in select statement
                            
                                When to use Common Table Expression (CTE)
                            
                                How can I solve a connection pool problem between ASP.NET and SQL Server?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Function to Calculate Median in SQL Server

Tags:

sql-server

tsql

aggregate-functions

median

People also ask

Recent Activity

Donate For Us