Given the following dataframe
In [31]: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2, 'B': rand.randn(6), 'C': rand.rand(6) > .5}) In [32]: df Out[32]: A B C 0 foo 1.624345 False 1 bar -0.611756 True 2 baz -0.528172 False 3 foo -1.072969 True 4 bar 0.865408 False 5 baz -2.301539 True
I would like to sort it in groups (A
) by the aggregated sum of B
, and then by the value in C
(not aggregated). So basically get the order of the A
groups with
In [28]: df.groupby('A').sum().sort('B') Out[28]: B C A baz -2.829710 1 bar 0.253651 1 foo 0.551377 1
And then by True/False, so that it ultimately looks like this:
In [30]: df.ix[[5, 2, 1, 4, 3, 0]] Out[30]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 False
How can this be done?
Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.
To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.
Use DataFrame. groupby(). sum() to group rows based on one or multiple columns and calculate sum agg function. groupby() function returns a DataFrameGroupBy object which contains an aggregate function sum() to calculate a sum of a given column for each group.
Groupby A:
In [0]: grp = df.groupby('A')
Within each group, sum over B and broadcast the values using transform. Then sort by B:
In [1]: grp[['B']].transform(sum).sort('B') Out[1]: B 2 -2.829710 5 -2.829710 1 0.253651 4 0.253651 0 0.551377 3 0.551377
Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:
In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index] In [3]: sort1 Out[3]: A B C 2 baz -0.528172 False 5 baz -2.301539 True 1 bar -0.611756 True 4 bar 0.865408 False 0 foo 1.624345 False 3 foo -1.072969 True
Finally, sort the 'C' values within groups of 'A' using the sort=False
option to preserve the A sort order from step 1:
In [4]: f = lambda x: x.sort('C', ascending=False) In [5]: sort2 = sort1.groupby('A', sort=False).apply(f) In [6]: sort2 Out[6]: A B C A baz 5 baz -2.301539 True 2 baz -0.528172 False bar 1 bar -0.611756 True 4 bar 0.865408 False foo 3 foo -1.072969 True 0 foo 1.624345 False
Clean up the df index by using reset_index
with drop=True
:
In [7]: sort2.reset_index(0, drop=True) Out[7]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With