Given the following dataframe <pre class="prettyprint"><code>In [31]: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2, 'B': rand.randn(6), 'C': rand.rand(6) > .5}) In [32]: df Out[32]: A B C 0 foo 1.624345 False 1 bar -0.611756 True 2 baz -0.528172 False 3 foo -1.072969 True 4 bar 0.865408 False 5 baz -2.301539 True </code></pre> I would like to sort it in groups (<code>A</code>) by the aggregated sum of <code>B</code>, and then by the value in <code>C</code> (not aggregated). So basically get the order of the <code>A</code> groups with <pre class="prettyprint"><code>In [28]: df.groupby('A').sum().sort('B') Out[28]: B C A baz -2.829710 1 bar 0.253651 1 foo 0.551377 1 </code></pre> And then by True/False, so that it ultimately looks like this: <pre class="prettyprint"><code>In [30]: df.ix[[5, 2, 1, 4, 3, 0]] Out[30]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 False </code></pre> How can this be done?

Groupby A: <pre class="prettyprint"><code>In [0]: grp = df.groupby('A') </code></pre> Within each group, sum over B and broadcast the values using transform. Then sort by B: <pre class="prettyprint"><code>In [1]: grp[['B']].transform(sum).sort('B') Out[1]: B 2 -2.829710 5 -2.829710 1 0.253651 4 0.253651 0 0.551377 3 0.551377 </code></pre> Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values: <pre class="prettyprint"><code>In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index] In [3]: sort1 Out[3]: A B C 2 baz -0.528172 False 5 baz -2.301539 True 1 bar -0.611756 True 4 bar 0.865408 False 0 foo 1.624345 False 3 foo -1.072969 True </code></pre> Finally, sort the 'C' values within groups of 'A' using the <code>sort=False</code> option to preserve the A sort order from step 1: <pre class="prettyprint"><code>In [4]: f = lambda x: x.sort('C', ascending=False) In [5]: sort2 = sort1.groupby('A', sort=False).apply(f) In [6]: sort2 Out[6]: A B C A baz 5 baz -2.301539 True 2 baz -0.528172 False bar 1 bar -0.611756 True 4 bar 0.865408 False foo 3 foo -1.072969 True 0 foo 1.624345 False </code></pre> Clean up the df index by using <code>reset_index</code> with <code>drop=True</code>: <pre class="prettyprint"><code>In [7]: sort2.reset_index(0, drop=True) Out[7]: A B C 5 baz -2.301539 True 2 baz -0.528172 False 1 bar -0.611756 True 4 bar 0.865408 False 3 foo -1.072969 True 0 foo 1.624345 False </code></pre>

Pandas sort by group aggregate and column

Tags:

python

sorting

pandas

dataframe

group-by

Given the following dataframe

In [31]: rand = np.random.RandomState(1)          df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,                             'B': rand.randn(6),                             'C': rand.rand(6) > .5})  In [32]: df Out[32]:      A         B      C          0  foo  1.624345  False          1  bar -0.611756   True          2  baz -0.528172  False          3  foo -1.072969   True          4  bar  0.865408  False          5  baz -2.301539   True

I would like to sort it in groups (A) by the aggregated sum of B, and then by the value in C (not aggregated). So basically get the order of the A groups with

In [28]: df.groupby('A').sum().sort('B') Out[28]:             B  C          A                         baz -2.829710  1          bar  0.253651  1          foo  0.551377  1

And then by True/False, so that it ultimately looks like this:

In [30]: df.ix[[5, 2, 1, 4, 3, 0]] Out[30]: A         B      C     5  baz -2.301539   True     2  baz -0.528172  False     1  bar -0.611756   True     4  bar  0.865408  False     3  foo -1.072969   True     0  foo  1.624345  False

How can this be done?

210

asked Feb 18 '13 16:02

beardc

1 Answers

Groupby A:

In [0]: grp = df.groupby('A')

Within each group, sum over B and broadcast the values using transform. Then sort by B:

In [1]: grp[['B']].transform(sum).sort('B') Out[1]:           B 2 -2.829710 5 -2.829710 1  0.253651 4  0.253651 0  0.551377 3  0.551377

Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:

In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]  In [3]: sort1 Out[3]:      A         B      C 2  baz -0.528172  False 5  baz -2.301539   True 1  bar -0.611756   True 4  bar  0.865408  False 0  foo  1.624345  False 3  foo -1.072969   True

Finally, sort the 'C' values within groups of 'A' using the sort=False option to preserve the A sort order from step 1:

In [4]: f = lambda x: x.sort('C', ascending=False)  In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)  In [6]: sort2 Out[6]:          A         B      C A baz 5  baz -2.301539   True     2  baz -0.528172  False bar 1  bar -0.611756   True     4  bar  0.865408  False foo 3  foo -1.072969   True     0  foo  1.624345  False

Clean up the df index by using reset_index with drop=True:

In [7]: sort2.reset_index(0, drop=True) Out[7]:      A         B      C 5  baz -2.301539   True 2  baz -0.528172  False 1  bar -0.611756   True 4  bar  0.865408  False 3  foo -1.072969   True 0  foo  1.624345  False

198

answered Sep 18 '22 02:09

Zelazny7

Related questions
                            
                                Split a list into parts based on a set of indexes in Python
                            
                                Inheritance best practice : *args, **kwargs or explicitly specifying parameters
                            
                                Python unicode equal comparison failed
                            
                                How to install my own python module (package) via conda and watch its changes
                            
                                Is it okay to pass self to an external function
                            
                                Is Tensorflow compatible with a Windows workflow?
                            
                                What is the difference between armeabi-v7a, arm64-v8a, x86?
                            
                                Converting Exception to a string in Python 3
                            
                                Python argparse and bash completion
                            
                                Trouble passing in lambda to apply for pandas DataFrame
                            
                                preventing python coverage from including virtual environment site packages
                            
                                Running cron python jobs within docker
                            
                                Django 1.11 TypeError context must be a dict rather than Context
                            
                                output the command line called by subprocess?
                            
                                threading ignores KeyboardInterrupt exception
                            
                                directory path types with argparse
                            
                                pandas concat generates nan values
                            
                                Non blocking subprocess.call
                            
                                Flask jsonify a list of objects
                            
                                How to limit the size of a dictionary?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With