I'm working with this WNBA dataset here. I'm analyzing the Height
variable, and below is a table showing frequency, cumulative percentage, and cumulative frequency for each height value recorded:
From the table I can easily conclude that the first quartile (the 25th percentile) cannot be larger than 175.
However, when I use Series.describe()
, I'm told that the 25th percentile is 176.5. Why is that so?
wnba.Height.describe()
count 143.000000
mean 184.566434
std 8.685068
min 165.000000
25% 176.500000
50% 185.000000
75% 191.000000
max 206.000000
Name: Height, dtype: float64
Pandas DataFrame describe() Method mean - The average (mean) value. std - The standard deviation. min - the minimum value. 25% - The 25% percentile*. 50% - The 50% percentile*.
Percentiles are given as percent values, values such as 95%, 40%, or 27%. Quantiles are given as decimal values, values such as 0.95, 0.4, and 0.27.
There are various ways to estimate the quantiles.
The 175.0 vs 176.5 relates to two different methods:
The estimation differs as follows
#1
h = (N − 1)*p + 1 #p being 0.25 in your case
Est_Quantile = x⌊h⌋ + (h − ⌊h⌋)*(x⌊h⌋ + 1 − x⌊h⌋)
#2
h = (N + 1)*p
x⌊h⌋ + (h − ⌊h⌋)*(x⌊h⌋ + 1 − x⌊h⌋)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With