Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should the interquartile range be calculated in Python?

I have a list of numbers [1, 2, 3, 4, 5, 6, 7] and I want to have a function to return the interquartile range of this list of numbers. The interquartile range is the difference between the upper and lower quartiles. I have attempted to calculate the interquartile range using NumPy functions and using Wolfram Alpha. I find all of the answers, from my manual one, to the NumPy one, tothe Wolfram Alpha, to be different. I do not know why this is.

My attempt in Python is as follows:

>>> a = numpy.array([1, 2, 3, 4, 5, 6, 7])
>>> numpy.percentile(a, 25)
2.5
>>> numpy.percentile(a, 75)
5.5
>>> numpy.percentile(a, 75) - numpy.percentile(a, 25) # IQR
3.0

My attempt in Wolfram Alpha is as follows:

  • "first quartile 1, 2, 3, 4, 5, 6, 7": 2.25
  • "third quartile 1, 2, 3, 4, 5, 6, 7": 5.75
  • (comment: 5.75 - 2.25 = 3.5)
  • "interquartile range 1, 2, 3, 4, 5, 6, 7": ~3.5

So, I find that the values returned by NumPy and Wolfram Alpha for what I think are the first quartile, the third quartile and the interquartile range are not consistent. Why is this? What should I be doing in Python to calculate the interquartile range correctly?

As far as I am aware, the interquartile range of [1, 2, 3, 4, 5, 6, 7] should be the following:

median(5, 6, 7) - median(1, 2, 3) = 4.
like image 756
d3pd Avatar asked Dec 14 '14 18:12

d3pd


People also ask

What is the interquartile range of all the variables Python?

The interquartile range, often denoted “IQR”, is a way to measure the spread of the middle 50% of a dataset. It is calculated as the difference between the first quartile* (the 25th percentile) and the third quartile (the 75th percentile) of a dataset.

Which is the correct way to calculate interquartile range?

To find the interquartile range (IQR), ​first find the median (middle value) of the lower and upper half of the data. These values are quartile 1 (Q1) and quartile 3 (Q3). The IQR is the difference between Q3 and Q1.

How do you find Q1 and Q3 in Python?

running np. percentile(samples, [25, 50, 75]) returns the actual values from the list: Out[1]: array([12., 14., 22.]) However, the quartiles are Q1=10.0, Median=14, Q3=24.5 (you can also use this link to find the quartiles and median online).


1 Answers

Version 1.9 of numpy features a handy 'interpolation' argument to help you get to 4.

a = numpy.array([1, 2, 3, 4, 5, 6, 7])
numpy.percentile(a, 75, interpolation='higher') - numpy.percentile(a, 25, interpolation='lower')
like image 115
warner121 Avatar answered Oct 25 '22 01:10

warner121