Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Median in sql server upto every date in the table

I use below query to find the median for every sector

SELECT DISTINCT Sector,
    PERCENTILE_DISC(0.5) WITHIN
GROUP (ORDER BY Value) OVER (PARTITION BY sector) AS Median
FROM TABLE

The table is in below format

    Sector  Date    Value
    A   2014-08-01  1
    B   2014-08-01  5
    C   2014-08-01  7
    A   2014-08-02  6
    B   2014-08-02  5
    C   2014-08-02  4
    A   2014-08-03  3
    B   2014-08-03  9
    C   2014-08-03  6
    A   2014-08-04  5
    B   2014-08-04  8
    C   2014-08-04  9
    A   2014-08-05  5
    B   2014-08-05  7
    C   2014-08-05  2   

So I get the expected result as below

    Sector  Median
    A   5
    B   7
    C   6

Now I need to change the process such that the Medians are calculated while only considering the records upto the given date. So the new result would be

    Sector  Date    Value
    A   2014-08-01  1
    B   2014-08-01  5
    C   2014-08-01  7 (Only 1 record each was considered for A, B and C) 

    A   2014-08-02  3.5
    B   2014-08-02  5
    C   2014-08-02  5.5 (2 records each was considered for A, B and C)

    A   2014-08-03  3
    B   2014-08-03  5
    C   2014-08-03  6 (3 records each was considered for A, B and C)

    A   2014-08-04  4
    B   2014-08-04  6.5
    C   2014-08-04  6.5 (4 records each was considered for A, B and C)

    A   2014-08-05  5
    B   2014-08-05  7
    C   2014-08-05  6 (All 5 records each was considered for A, B and C) 

So this will be sort of a cumulative median. Can someone please tell me how to achieve this. My table has about 2.3M records with about 1100 records each for about 1100 dates.

Please let me know if you need any info.

like image 315
John Avatar asked Aug 29 '14 13:08

John


People also ask

How do you find the median of a date in SQL?

When there are an even number of values, the median is the average of a date time. That can be a bit hard to calculate. Instead, take the minimum date and add the difference between the maximum and minimum. This produces the average when there are an even number of elements.

Which query produces the median for the given table?

To find the median value using a MySQL query, you need to write a subquery that returns the column where you want to find the median value with a number index attached to each row. The complete solution to find a median value would be similar to the following query: SET @row_index := -1; SELECT AVG(subq.

How do you calculate the median for a given column of numbers in a data set SQL?

For example, if we apply this formula to the dataset {1,2,4,6,8,10}, then the median value is calculated as shown below: Median (M)= [ 6/2 ] = 3rd value of the dataset + [ 6/2 + 1 ]= 4th value of the dataset. = (4+6)/2 = 5. So, the median value in this case is 5.

How do you find the mean and median in SQL?

Mean is nothing but the average of the given set of numbers calculated by dividing the sum of all the numbers in the set by the count of numbers in the set. In SQL Server, you can calculate the mean by using the AVG() function.

How to calculate the median in SQL?

Suppose you need to calculate the Median using SQL. You can loosely define the median as the “middle” value of your data set. If you were calculating the median by hand, you would use the following rules to do so: 1. When there is an odd number of rows, you can easily find the middle: 2.

What is the median in statistics?

You can loosely define the median as the “middle” value of your data set. If you were calculating the median by hand, you would use the following rules to do so: 1. When there is an odd number of rows, you can easily find the middle: 2. It’s a bit more complicated where there are even rows, as you need to computer the middle:

How do I calculate the median of the distance from the demo?

Consider a table demo where Name is the student name and Distance is the total distance (in km) from their home to college. We calculate the median of the Distance from the demo table. Beginning with the internal subquery – the select assigns @rowindex as an incremental index for each distance that is selected and sorts the distance.

How do you calculate median sales in Excel with 8 rows?

Our sample has eight rows, so falls into the “even number of rows” case for calculating the median. You’ll see that the MedianSales is the average row 4 and 5 TotalSales: (255.53 _ 262.96) / 2 = 258.745, which is also the result percentil_cont (.5) achieves.


2 Answers

Another way is to create a triangular JOIN to get all the past value for every day and use that as the data

;With T AS (
  SELECT t2.Sector, t2.[Date], t1.[Value]
  FROM   Table1 t1
         LEFT  JOIN Table1 t2 ON t1.Sector = t2.Sector and t1.[Date] <= t2.[Date]
)
SELECT DISTINCT Sector
     , [Date]
     , PERCENTILE_CONT(0.5) 
         WITHIN GROUP (ORDER BY [Value]) 
         OVER (PARTITION BY sector, [Date]) AS Median 
FROM   T
ORDER BY [Date], Sector;

SQLFiddle demo

In the query I've changed PERCENTILE_DISC with PERCENTILE_CONT to get the right median in case of even number of values, for example the second day.

like image 197
Serpiton Avatar answered Oct 31 '22 09:10

Serpiton


That makes it harder, because the following does not work:

SELECT DISTINCT Sector, Date,
       PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
FROM TABLE;

Alas. You can use cross apply for this purpose:

select t.sector, t.date, t.value, m.median
from table t cross apply
     (select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
      from table t2
      where t2.sector = t.sector and t2.date <= t.date
     ) m;
like image 32
Gordon Linoff Avatar answered Oct 31 '22 11:10

Gordon Linoff