df:
name score A 1 A 2 A 3 A 4 A 5 B 2 B 4 B 6 B 8
Want to get the following new dataframe in the form of below:
name count mean std min 25% 50% 75% max A 5 3 .. .. .. .. .. .. B 4 5 .. .. .. .. .. ..
How to exctract the information from df.describe() and reformat it? Thanks
By use + operator simply you can combine/merge two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.
The describe() function is used to generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset's distribution, excluding NaN values. Syntax: DataFrame.describe(self, percentiles=None, include=None, exclude=None) Parameters: Name.
there is even a shorter one :)
print df.groupby('name').describe().unstack(1)
Nothing beats one-liner:
In [145]:
print df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1')
In[1]: import pandas as pd import io data = """ name score A 1 A 2 A 3 A 4 A 5 B 2 B 4 B 6 B 8 """ df = pd.read_csv(io.StringIO(data), delimiter='\s+') print(df)
.
Out[1]: name score 0 A 1 1 A 2 2 A 3 3 A 4 4 A 5 5 B 2 6 B 4 7 B 6 8 B 8
A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame()
to iterate over the results of groupby
, and construct the summary stats dataframe on the fly:
In[2]: df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze() for name, group in df.groupby('name')) print(df2)
.
Out[2]: count mean std min 25% 50% 75% max A 5 3 1.581139 1 2.0 3 4.0 5 B 4 5 2.581989 2 3.5 5 6.5 8
Here the squeeze
function is squeezing out a dimension, to convert the one-column group summary stats Dataframe
into a Series
.
Footnote: A generator expression has the form my_function(a) for a in iterator
, or if iterator
gives us back two-element tuples
, as in the case of groupby
: my_function(a,b) for a,b in iterator
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With