Given the following dataframe
In [31]: rand = np.random.RandomState(1)          df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,                             'B': rand.randn(6),                             'C': rand.rand(6) > .5})  In [32]: df Out[32]:      A         B      C          0  foo  1.624345  False          1  bar -0.611756   True          2  baz -0.528172  False          3  foo -1.072969   True          4  bar  0.865408  False          5  baz -2.301539   True    I would like to sort it in groups (A) by the aggregated sum of B, and then by the value in C (not aggregated). So basically get the order of the A groups with
In [28]: df.groupby('A').sum().sort('B') Out[28]:             B  C          A                         baz -2.829710  1          bar  0.253651  1          foo  0.551377  1   And then by True/False, so that it ultimately looks like this:
In [30]: df.ix[[5, 2, 1, 4, 3, 0]] Out[30]: A         B      C     5  baz -2.301539   True     2  baz -0.528172  False     1  bar -0.611756   True     4  bar  0.865408  False     3  foo -1.072969   True     0  foo  1.624345  False   How can this be done?
Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.
To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Groupby A:
In [0]: grp = df.groupby('A')   Within each group, sum over B and broadcast the values using transform. Then sort by B:
In [1]: grp[['B']].transform(sum).sort('B') Out[1]:           B 2 -2.829710 5 -2.829710 1  0.253651 4  0.253651 0  0.551377 3  0.551377   Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:
In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]  In [3]: sort1 Out[3]:      A         B      C 2  baz -0.528172  False 5  baz -2.301539   True 1  bar -0.611756   True 4  bar  0.865408  False 0  foo  1.624345  False 3  foo -1.072969   True   Finally, sort the 'C' values within groups of 'A' using the sort=False option to preserve the A sort order from step 1:
In [4]: f = lambda x: x.sort('C', ascending=False)  In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)  In [6]: sort2 Out[6]:          A         B      C A baz 5  baz -2.301539   True     2  baz -0.528172  False bar 1  bar -0.611756   True     4  bar  0.865408  False foo 3  foo -1.072969   True     0  foo  1.624345  False   Clean up the df index by using reset_index with drop=True:
In [7]: sort2.reset_index(0, drop=True) Out[7]:      A         B      C 5  baz -2.301539   True 2  baz -0.528172  False 1  bar -0.611756   True 4  bar  0.865408  False 3  foo -1.072969   True 0  foo  1.624345  False 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With