I am going through pandas groupby docs and when I groupby on particular column as below:
df:
A B C D
0 foo one -0.987674 0.039616
1 bar one -0.653247 -1.022529
2 foo two 0.404201 1.308777
3 bar three 1.620780 0.574377
4 foo two 1.661942 0.579888
5 bar two 0.747878 0.463052
6 foo one 0.070278 0.202564
7 foo three 0.779684 -0.547192
grouped=df.groupby('A')
grouped.describe(A)
gives
C ... D
count mean std ... 50% 75% max
A B ...
bar one 1.0 0.224944 NaN ... 1.107509 1.107509 1.107509
three 1.0 0.704943 NaN ... 1.833098 1.833098 1.833098
two 1.0 -0.091613 NaN ... -0.549254 -0.549254 -0.549254
foo one 2.0 0.282298 1.554401 ... -0.334058 0.046640 0.427338
three 1.0 1.688601 NaN ... -1.457338 -1.457338 -1.457338
two 2.0 1.206690 0.917140 ... -0.096405 0.039241 0.174888
what 25%,50%,75% signifies when described? a bit of explaination please?
In simple words...
You will see the percentiles(25%, 50%, 75%..etc) and some values in front of them.
The significance is to tell you the distribution of your data.
For example:
s = pd.Series([1, 2, 3, 1])
s.describe() will give
count 4.000000
mean 1.750000
std 0.957427
min 1.000000
25% 1.000000
50% 1.500000
75% 2.250000
max 3.000000
25% means 25% of your data have the value 1.0000 or below. That is if you were to look at your data manually, 25% of it is less than or equal 1. (you will agree with this if you look at our data [1, 2, 3, 1]. [1] which is 25% of the data is less than or equal to 1.
50% means 50% of your data have the value 1.5 or below. [1, 1] which constitute 50% of the data are less than or equal 1.5.
75% means 75% of your data have the value 2.25 or below. [1, 2, 1] which constitute 75% of the data are less than or equal 2.25.
To interpret the min, 25%, 50%, 75% and max values, imagine sorting each column from lowest to highest value. The first (smallest) value is the min. If you go a quarter way through the list, you'll find a number that is bigger than 25% of the values and smaller than 75% of the values. That is the 25% value (pronounced "25th percentile"). The 50th and 75th percentiles are defined analogously, and the max is the largest number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With