Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interquartile Range - Lower, Upper and Median

I'm trying to work out the interquartile range based on an array of numbers which can be any length e.g.

1,  1,  5,  6,  7,  8,  2,  4,  7,  9,  9,  9,  9

The values that I need to work out from this interquartile range are:

  • Upper Quartile
  • Median
  • Lower Quartile

If I dump the above array of numbers into Microsoft Excel (columns A:M), then I can use the following formulas:

  • =QUARTILE.INC(A1:M1,1)
  • =QUARTILE.INC(A1:M1,2)
  • =QUARTILE.INC(A1:M1,3)

To get my answers of:

  • 4
  • 7
  • 9

I now need to work out these 3 values in either SQL Server or VB.NET. I can get the array values in any format or object in either of these languages, but I can't find any functions that exist like the QUARTILE.INC function that Excel has.

Does anyone know how this could be achieved in either SQL Server or VB.NET?

like image 521
ca8msm Avatar asked Apr 29 '15 13:04

ca8msm


People also ask

How do you find the upper and lower bounds of an interquartile range?

We're now ready to go back and use these rules to calculate the upper and lower bounds for outliers. The lower bound for outliers will be 𝑥 less than 𝑄 one minus 1.5 times the IQR. And the upper bound for outliers will be such that 𝑥 is greater than 𝑄 three plus 1.5 times the IQR.

What does a lower interquartile range mean?

It is the range for the middle 50% of your sample. Use the IQR to assess the variability where most of your values lie. Larger values indicate that the central portion of your data spread out further. Conversely, smaller values show that the middle values cluster more tightly.

Can the interquartile range be greater than the median?

But the IQR is greater than the median for distribution 1, and less for distribution 2. Also, consider that any distribution with median less than 0 will have IQR greater than the median.

How do you interpret the median and interquartile range?

Median is a measure of center. measure of variability a single number that summarizes how much the values in a data set vary. Interquartile range is a measure of variability. median the middle number, or the halfway point between the two middle numbers, in an ordered set of values.


1 Answers

There might be an easier way, but to get Quartiles, you can use NTILE (Transact-SQL)

Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

So for your data:

SELECT  1 Val
INTO    #temp
UNION ALL
SELECT  1
UNION ALL
SELECT  5
UNION ALL
SELECT  6
UNION ALL
SELECT  7
UNION ALL
SELECT  8
UNION ALL
SELECT  2
UNION ALL
SELECT  4
UNION ALL
SELECT  7
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9

-- NTILE(4) specifies you require 4 partitions (quartiles)
SELECT  NTILE(4) OVER ( ORDER BY Val ) AS Quartile ,
        Val
INTO #tempQuartiles
FROM    #temp

SELECT * 
FROM #tempQuartiles

DROP TABLE #temp
DROP TABLE #tempQuartiles

This would produce:

Quartile    Val
1           1
1           1
1           2
1           4
2           5
2           6
2           7
3           7
3           8
3           9
4           9
4           9
4           9

From this you can work out what you're after.

So modifying the SELECT you can do this:

SELECT Quartile, MAX(Val) MaxVal
FROM #tempQuartiles
WHERE Quartile <= 3
GROUP BY Quartile

To produce:

Quartile    MaxVal
1           4
2           7
3           9
like image 91
Tanner Avatar answered Sep 29 '22 21:09

Tanner