I use below query to find the median for every sector
SELECT DISTINCT Sector,
PERCENTILE_DISC(0.5) WITHIN
GROUP (ORDER BY Value) OVER (PARTITION BY sector) AS Median
FROM TABLE
The table is in below format
Sector Date Value
A 2014-08-01 1
B 2014-08-01 5
C 2014-08-01 7
A 2014-08-02 6
B 2014-08-02 5
C 2014-08-02 4
A 2014-08-03 3
B 2014-08-03 9
C 2014-08-03 6
A 2014-08-04 5
B 2014-08-04 8
C 2014-08-04 9
A 2014-08-05 5
B 2014-08-05 7
C 2014-08-05 2
So I get the expected result as below
Sector Median
A 5
B 7
C 6
Now I need to change the process such that the Medians are calculated while only considering the records upto the given date. So the new result would be
Sector Date Value
A 2014-08-01 1
B 2014-08-01 5
C 2014-08-01 7 (Only 1 record each was considered for A, B and C)
A 2014-08-02 3.5
B 2014-08-02 5
C 2014-08-02 5.5 (2 records each was considered for A, B and C)
A 2014-08-03 3
B 2014-08-03 5
C 2014-08-03 6 (3 records each was considered for A, B and C)
A 2014-08-04 4
B 2014-08-04 6.5
C 2014-08-04 6.5 (4 records each was considered for A, B and C)
A 2014-08-05 5
B 2014-08-05 7
C 2014-08-05 6 (All 5 records each was considered for A, B and C)
So this will be sort of a cumulative median. Can someone please tell me how to achieve this. My table has about 2.3M records with about 1100 records each for about 1100 dates.
Please let me know if you need any info.
When there are an even number of values, the median is the average of a date time. That can be a bit hard to calculate. Instead, take the minimum date and add the difference between the maximum and minimum. This produces the average when there are an even number of elements.
To find the median value using a MySQL query, you need to write a subquery that returns the column where you want to find the median value with a number index attached to each row. The complete solution to find a median value would be similar to the following query: SET @row_index := -1; SELECT AVG(subq.
For example, if we apply this formula to the dataset {1,2,4,6,8,10}, then the median value is calculated as shown below: Median (M)= [ 6/2 ] = 3rd value of the dataset + [ 6/2 + 1 ]= 4th value of the dataset. = (4+6)/2 = 5. So, the median value in this case is 5.
Mean is nothing but the average of the given set of numbers calculated by dividing the sum of all the numbers in the set by the count of numbers in the set. In SQL Server, you can calculate the mean by using the AVG() function.
Suppose you need to calculate the Median using SQL. You can loosely define the median as the “middle” value of your data set. If you were calculating the median by hand, you would use the following rules to do so: 1. When there is an odd number of rows, you can easily find the middle: 2.
You can loosely define the median as the “middle” value of your data set. If you were calculating the median by hand, you would use the following rules to do so: 1. When there is an odd number of rows, you can easily find the middle: 2. It’s a bit more complicated where there are even rows, as you need to computer the middle:
Consider a table demo where Name is the student name and Distance is the total distance (in km) from their home to college. We calculate the median of the Distance from the demo table. Beginning with the internal subquery – the select assigns @rowindex as an incremental index for each distance that is selected and sorts the distance.
Our sample has eight rows, so falls into the “even number of rows” case for calculating the median. You’ll see that the MedianSales is the average row 4 and 5 TotalSales: (255.53 _ 262.96) / 2 = 258.745, which is also the result percentil_cont (.5) achieves.
Another way is to create a triangular JOIN
to get all the past value for every day and use that as the data
;With T AS (
SELECT t2.Sector, t2.[Date], t1.[Value]
FROM Table1 t1
LEFT JOIN Table1 t2 ON t1.Sector = t2.Sector and t1.[Date] <= t2.[Date]
)
SELECT DISTINCT Sector
, [Date]
, PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY [Value])
OVER (PARTITION BY sector, [Date]) AS Median
FROM T
ORDER BY [Date], Sector;
SQLFiddle demo
In the query I've changed PERCENTILE_DISC
with PERCENTILE_CONT
to get the right median in case of even number of values, for example the second day.
That makes it harder, because the following does not work:
SELECT DISTINCT Sector, Date,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Value) OVER (PARTITION BY sector ORDER BY DATE) AS Median
FROM TABLE;
Alas. You can use cross apply
for this purpose:
select t.sector, t.date, t.value, m.median
from table t cross apply
(select top 1 PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY t2.Value) OVER (PARTITION BY sector ORDER BY t2.DATE) AS Median
from table t2
where t2.sector = t.sector and t2.date <= t.date
) m;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With