df: <pre class="prettyprint"><code>name score A 1 A 2 A 3 A 4 A 5 B 2 B 4 B 6 B 8 </code></pre> Want to get the following new dataframe in the form of below: <pre class="prettyprint"><code> name count mean std min 25% 50% 75% max A 5 3 .. .. .. .. .. .. B 4 5 .. .. .. .. .. .. </code></pre> How to exctract the information from df.describe() and reformat it? Thanks

there is even a shorter one :) <pre class="prettyprint"><code>print df.groupby('name').describe().unstack(1) </code></pre> <blockquote> Nothing beats one-liner: In [145]: print df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1') </blockquote>

Pandas dataframe: how to apply describe() to each group and add to new columns?

Tags:

python

pandas

dataframe

numpy

df:

name score A      1 A      2 A      3 A      4 A      5 B      2 B      4 B      6  B      8

Want to get the following new dataframe in the form of below:

   name count mean std min 25% 50% 75% max     A     5    3    .. ..  ..  ..  ..  ..     B     4    5    .. ..  ..  ..  ..  ..

How to exctract the information from df.describe() and reformat it? Thanks

391

asked Nov 06 '15 20:11

Robin1988

2 Answers

there is even a shorter one :)

print df.groupby('name').describe().unstack(1)

Nothing beats one-liner:

In [145]:

print df.groupby('name').describe().reset_index().pivot(index='name', values='score', columns='level_1')

197

answered Sep 20 '22 08:09

Andrey Vykhodtsev

Define some data

In[1]: import pandas as pd import io  data = """ name score A      1 A      2 A      3 A      4 A      5 B      2 B      4 B      6 B      8     """  df = pd.read_csv(io.StringIO(data), delimiter='\s+') print(df)

Out[1]:   name  score 0    A      1 1    A      2 2    A      3 3    A      4 4    A      5 5    B      2 6    B      4 7    B      6 8    B      8

Solution

A nice approach to this problem uses a generator expression (see footnote) to allow pd.DataFrame() to iterate over the results of groupby, and construct the summary stats dataframe on the fly:

In[2]: df2 = pd.DataFrame(group.describe().rename(columns={'score':name}).squeeze()                          for name, group in df.groupby('name'))  print(df2)

Out[2]:    count  mean       std  min  25%  50%  75%  max A      5     3  1.581139    1  2.0    3  4.0    5 B      4     5  2.581989    2  3.5    5  6.5    8

Here the squeeze function is squeezing out a dimension, to convert the one-column group summary stats Dataframe into a Series.

Footnote: A generator expression has the form my_function(a) for a in iterator, or if iterator gives us back two-element tuples, as in the case of groupby: my_function(a,b) for a,b in iterator