Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is pandas showing the wrong percentile?

I'm working with this WNBA dataset here. I'm analyzing the Height variable, and below is a table showing frequency, cumulative percentage, and cumulative frequency for each height value recorded:

img

From the table I can easily conclude that the first quartile (the 25th percentile) cannot be larger than 175.

However, when I use Series.describe(), I'm told that the 25th percentile is 176.5. Why is that so?

wnba.Height.describe()
count    143.000000
mean     184.566434
std        8.685068
min      165.000000
25%      176.500000
50%      185.000000
75%      191.000000
max      206.000000
Name: Height, dtype: float64
like image 652
Alex Avatar asked Feb 28 '18 08:02

Alex


People also ask

What is 25% in pandas describe?

Pandas DataFrame describe() Method mean - The average (mean) value. std - The standard deviation. min - the minimum value. 25% - The 25% percentile*. 50% - The 50% percentile*.

Is quantile same as percentile?

Percentiles are given as percent values, values such as 95%, 40%, or 27%. Quantiles are given as decimal values, values such as 0.95, 0.4, and 0.27.


1 Answers

There are various ways to estimate the quantiles.
The 175.0 vs 176.5 relates to two different methods:

  1. Includes the Q1 ( this gives 176.5) and
  2. Excludes the Q1( gives 175.0).

The estimation differs as follows

#1
h = (N − 1)*p + 1 #p being 0.25 in your case
Est_Quantile =  x⌊h⌋ + (h − ⌊h⌋)*(x⌊h⌋ + 1 − x⌊h⌋)

#2
h = (N + 1)*p   
x⌊h⌋ + (h − ⌊h⌋)*(x⌊h⌋ + 1 − x⌊h⌋) 
like image 155
Gaurav Taneja Avatar answered Nov 01 '22 15:11

Gaurav Taneja