Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are 25%,50%,75% values when we describe a grouped dataframe?

I am going through pandas groupby docs and when I groupby on particular column as below:

df:

     A      B         C         D
0  foo    one -0.987674  0.039616
1  bar    one -0.653247 -1.022529
2  foo    two  0.404201  1.308777
3  bar  three  1.620780  0.574377
4  foo    two  1.661942  0.579888
5  bar    two  0.747878  0.463052
6  foo    one  0.070278  0.202564
7  foo  three  0.779684 -0.547192

grouped=df.groupby('A')
grouped.describe(A)

gives

              C                      ...         D                    
          count      mean       std  ...       50%       75%       max
A   B                                ...                              
bar one     1.0  0.224944       NaN  ...  1.107509  1.107509  1.107509
    three   1.0  0.704943       NaN  ...  1.833098  1.833098  1.833098
    two     1.0 -0.091613       NaN  ... -0.549254 -0.549254 -0.549254
foo one     2.0  0.282298  1.554401  ... -0.334058  0.046640  0.427338
    three   1.0  1.688601       NaN  ... -1.457338 -1.457338 -1.457338
    two     2.0  1.206690  0.917140  ... -0.096405  0.039241  0.174888

what 25%,50%,75% signifies when described? a bit of explaination please?

like image 914
Codenewbie Avatar asked Sep 10 '19 11:09

Codenewbie


2 Answers

In simple words...

You will see the percentiles(25%, 50%, 75%..etc) and some values in front of them.

The significance is to tell you the distribution of your data.

For example:

s = pd.Series([1, 2, 3, 1])

s.describe()   will give

count    4.000000
mean     1.750000
std      0.957427
min      1.000000
25%      1.000000
50%      1.500000
75%      2.250000
max      3.000000

25% means 25% of your data have the value 1.0000 or below. That is if you were to look at your data manually, 25% of it is less than or equal 1. (you will agree with this if you look at our data [1, 2, 3, 1]. [1] which is 25% of the data is less than or equal to 1.

50% means 50% of your data have the value 1.5 or below. [1, 1] which constitute 50% of the data are less than or equal 1.5.

75% means 75% of your data have the value 2.25 or below. [1, 2, 1] which constitute 75% of the data are less than or equal 2.25.

like image 61
Mustapha Babatunde Avatar answered Sep 18 '22 21:09

Mustapha Babatunde


To interpret the min, 25%, 50%, 75% and max values, imagine sorting each column from lowest to highest value. The first (smallest) value is the min. If you go a quarter way through the list, you'll find a number that is bigger than 25% of the values and smaller than 75% of the values. That is the 25% value (pronounced "25th percentile"). The 50th and 75th percentiles are defined analogously, and the max is the largest number.

like image 39
SIBBIR AHMED Avatar answered Sep 18 '22 21:09

SIBBIR AHMED