Is pandas showing the wrong percentile?

Tags:

I'm working with this WNBA dataset here. I'm analyzing the Height variable, and below is a table showing frequency, cumulative percentage, and cumulative frequency for each height value recorded:

From the table I can easily conclude that the first quartile (the 25th percentile) cannot be larger than 175.

However, when I use Series.describe(), I'm told that the 25th percentile is 176.5. Why is that so?

wnba.Height.describe()
count    143.000000
mean     184.566434
std        8.685068
min      165.000000
25%      176.500000
50%      185.000000
75%      191.000000
max      206.000000
Name: Height, dtype: float64

652

asked Feb 28 '18 08:02

Alex

1 Answers

There are various ways to estimate the quantiles.
The 175.0 vs 176.5 relates to two different methods:

Includes the Q1 ( this gives 176.5) and
Excludes the Q1( gives 175.0).

The estimation differs as follows

#1
h = (N − 1)*p + 1 #p being 0.25 in your case
Est_Quantile =  x⌊h⌋ + (h − ⌊h⌋)*(x⌊h⌋ + 1 − x⌊h⌋)

#2
h = (N + 1)*p   
x⌊h⌋ + (h − ⌊h⌋)*(x⌊h⌋ + 1 − x⌊h⌋)

155

answered Nov 01 '22 15:11

Gaurav Taneja

Related questions
                            
                                python enum.Enum _value_ vs value
                            
                                ending a program early, not in a loop?
                            
                                How to divide all rows in a panda Dataframe except for one specific row?
                            
                                Using groupby with expanding and a custom function
                            
                                Sqlalchemy mysql FLOAT precision and length
                            
                                Unsupported operand type(s) for *: map and map
                            
                                Numpy percentiles with linear interpolation - wrong value?
                            
                                Intermittent "getrandom() initialization failed" using scrapy spider
                            
                                is there a magic method for sorted() in Python?
                            
                                Remove non-ASCII characters from string columns in pandas
                            
                                "set_UVC" equivilent for a 3D quiver plot in matplotlib
                            
                                Creating a hotkey to enter text using python, running in background waiting for key-press
                            
                                Python Hadoop streaming on windows, Script not a valid Win32 application
                            
                                Pandas - add NaN for missing values when pd.merge
                            
                                What does Selenium .set_script_timeout(n) do and how is it different from driver.set_page_load_timeout(n)?
                            
                                Iteration over columns and rows in Pandas Dataframe
                            
                                Boto3 AWS API error responses for SSM
                            
                                ResultSet object has no attribute 'find_all'
                            
                                Tinting an image in Pygame
                            
                                How to append list of numerous types to single string (python)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is pandas showing the wrong percentile?

Tags:

python

pandas

statistics

Alex

People also ask

1 Answers

Gaurav Taneja

Recent Activity

Donate For Us