Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas sort by group aggregate and column

Given the following dataframe

In [31]: rand = np.random.RandomState(1)          df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,                             'B': rand.randn(6),                             'C': rand.rand(6) > .5})  In [32]: df Out[32]:      A         B      C          0  foo  1.624345  False          1  bar -0.611756   True          2  baz -0.528172  False          3  foo -1.072969   True          4  bar  0.865408  False          5  baz -2.301539   True  

I would like to sort it in groups (A) by the aggregated sum of B, and then by the value in C (not aggregated). So basically get the order of the A groups with

In [28]: df.groupby('A').sum().sort('B') Out[28]:             B  C          A                         baz -2.829710  1          bar  0.253651  1          foo  0.551377  1 

And then by True/False, so that it ultimately looks like this:

In [30]: df.ix[[5, 2, 1, 4, 3, 0]] Out[30]: A         B      C     5  baz -2.301539   True     2  baz -0.528172  False     1  bar -0.611756   True     4  bar  0.865408  False     3  foo -1.072969   True     0  foo  1.624345  False 

How can this be done?

like image 210
beardc Avatar asked Feb 18 '13 16:02

beardc


People also ask

How do I sort a Groupby column in pandas?

Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.

How do you sort after a Groupby pandas?

To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

How do you group by and sum multiple columns in pandas?

Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.


1 Answers

Groupby A:

In [0]: grp = df.groupby('A') 

Within each group, sum over B and broadcast the values using transform. Then sort by B:

In [1]: grp[['B']].transform(sum).sort('B') Out[1]:           B 2 -2.829710 5 -2.829710 1  0.253651 4  0.253651 0  0.551377 3  0.551377 

Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:

In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]  In [3]: sort1 Out[3]:      A         B      C 2  baz -0.528172  False 5  baz -2.301539   True 1  bar -0.611756   True 4  bar  0.865408  False 0  foo  1.624345  False 3  foo -1.072969   True 

Finally, sort the 'C' values within groups of 'A' using the sort=False option to preserve the A sort order from step 1:

In [4]: f = lambda x: x.sort('C', ascending=False)  In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)  In [6]: sort2 Out[6]:          A         B      C A baz 5  baz -2.301539   True     2  baz -0.528172  False bar 1  bar -0.611756   True     4  bar  0.865408  False foo 3  foo -1.072969   True     0  foo  1.624345  False 

Clean up the df index by using reset_index with drop=True:

In [7]: sort2.reset_index(0, drop=True) Out[7]:      A         B      C 5  baz -2.301539   True 2  baz -0.528172  False 1  bar -0.611756   True 4  bar  0.865408  False 3  foo -1.072969   True 0  foo  1.624345  False 
like image 198
Zelazny7 Avatar answered Sep 18 '22 02:09

Zelazny7