I have a data frame as follows: <pre class="prettyprint"><code>import pandas as pd import numpy as np df = pd.DataFrame({'id' : range(1,9), 'code' : ['one', 'one', 'two', 'three', 'two', 'three', 'one', 'two'], 'colour': ['black', 'white','white','white', 'black', 'black', 'white', 'white'], 'amount' : np.random.randn(8)}, columns= ['id','code','colour','amount']) </code></pre> I want to be able to group the <code>id</code>s by <code>code</code> and <code>colour</code> and then sort them with respect to <code>amount</code>. I know how to <code>groupby()</code>: <pre class="prettyprint"><code>df.groupby(['code','colour']).head(5) id code colour amount code colour one black 0 1 one black -0.117307 white 1 2 one white 1.653216 6 7 one white 0.817205 three black 5 6 three black 0.567162 white 3 4 three white 0.579074 two black 4 5 two black -1.683988 white 2 3 two white -0.457722 7 8 two white -1.277020 </code></pre> However, my desired output is as below, where I have two columns: 1.<code>code/colour</code>contains the key strings and 2.<code>id:amount</code> contains <code>id</code> - <code>amount</code> tuples sorted in descending order wrt <code>amount</code>: <pre class="prettyprint"><code>code/colour id:amount one/black {1:-0.117307} one/white {2:1.653216, 7:0.817205} three/black {6:0.567162} three/white {4:0.579074} two/black {5:-1.683988} two/white {3:-0.457722, 8:-1.277020} </code></pre> How can I transform the <code>DataFrameGroupBy</code> object displayed above to my desired format? Or, shall I not use <code>groupby()</code> in the first place? EDIT: Although not in the specified format, the code below kind of gives me the functionality I want: <pre class="prettyprint"><code>groups = dict(list(df.groupby(['code','colour']))) groups['one','white'] id code colour amount 1 2 one white 1.331766 6 7 one white 0.808739 </code></pre> How can I reduce the groups to only include the <code>id</code> and <code>amount</code> column?

First, groupby code and colour and then apply a customized function to format id and amount: <pre class="prettyprint"><code>df = df.groupby(['code', 'colour']).apply(lambda x:x.set_index('id').to_dict('dict')['amount']) </code></pre> And then modify the index: <pre class="prettyprint"><code>df.index = ['/'.join(i) for i in df.index] </code></pre> It will return a series, you can convert it back to DataFrame by: <pre class="prettyprint"><code>df = df.reset_index() </code></pre> Finally, add the column names by: <pre class="prettyprint"><code>df.columns=['code/colour','id:amount'] </code></pre> Result: <pre class="prettyprint"><code>In [105]: df Out[105]: code/colour id:amount 0 one/black {1: 0.392264412544} 1 one/white {2: 2.13950686015, 7: -0.393002947047} 2 three/black {6: -2.0766612539} 3 three/white {4: -1.18058561325} 4 two/black {5: -1.51959565941} 5 two/white {8: -1.7659863039, 3: -0.595666853895} </code></pre>

Pandas: transforming the DataFrameGroupBy object to desired format

Tags:

python

pandas

dataframe

group-by

I have a data frame as follows:

import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
                   'code' : ['one', 'one', 'two', 'three',
                             'two', 'three', 'one', 'two'],
                   'colour': ['black', 'white','white','white',
                           'black', 'black', 'white', 'white'],
                   'amount' : np.random.randn(8)},  columns= ['id','code','colour','amount'])

I want to be able to group the ids by code and colour and then sort them with respect to amount. I know how to groupby():

df.groupby(['code','colour']).head(5)
                id   code colour    amount
code  colour                              
one   black  0   1    one  black -0.117307
      white  1   2    one  white  1.653216
             6   7    one  white  0.817205
three black  5   6  three  black  0.567162
      white  3   4  three  white  0.579074
two   black  4   5    two  black -1.683988
      white  2   3    two  white -0.457722
             7   8    two  white -1.277020

However, my desired output is as below, where I have two columns: 1.code/colourcontains the key strings and 2.id:amount contains id - amount tuples sorted in descending order wrt amount:

code/colour  id:amount
one/black    {1:-0.117307}
one/white    {2:1.653216, 7:0.817205}
three/black  {6:0.567162}
three/white  {4:0.579074}
two/black    {5:-1.683988}
two/white    {3:-0.457722, 8:-1.277020}

How can I transform the DataFrameGroupBy object displayed above to my desired format? Or, shall I not use groupby() in the first place?

EDIT: Although not in the specified format, the code below kind of gives me the functionality I want:

groups = dict(list(df.groupby(['code','colour'])))
groups['one','white']
   id code colour    amount
1   2  one  white  1.331766
6   7  one  white  0.808739

How can I reduce the groups to only include the id and amount column?

415

asked Jan 14 '14 10:01

Zhubarb

1 Answers

First, groupby code and colour and then apply a customized function to format id and amount:

df = df.groupby(['code', 'colour']).apply(lambda x:x.set_index('id').to_dict('dict')['amount'])

And then modify the index:

df.index = ['/'.join(i) for i in df.index]

It will return a series, you can convert it back to DataFrame by:

df = df.reset_index()

Finally, add the column names by:

df.columns=['code/colour','id:amount']

Result:

In [105]: df
Out[105]: 
   code/colour                               id:amount
0    one/black                     {1: 0.392264412544}
1    one/white  {2: 2.13950686015, 7: -0.393002947047}
2  three/black                      {6: -2.0766612539}
3  three/white                     {4: -1.18058561325}
4    two/black                     {5: -1.51959565941}
5    two/white  {8: -1.7659863039, 3: -0.595666853895}

answered Sep 18 '22 04:09

waitingkuo

Related questions
                            
                                List Comprehension of Lists Nested in Dictionaries
                            
                                The equation -e**-((-log(7)/100.0)*(100-x))+7 returns NaN
                            
                                matplotlib change linewidth on line segments, using list
                            
                                How do I print this list vertically?
                            
                                generator vs. list comprehension
                            
                                Can't use read-write files with matplotlib's savefig()?
                            
                                call php function from python
                            
                                Alternative to Double Iteration
                            
                                cimport gives fatal error: 'numpy/arrayobject.h' file not found
                            
                                Count the number of occurrences between markers in a python list
                            
                                matplotlib not displaying intersection of 3D planes correctly
                            
                                Analytics API + Python Server, NotImplementedError Hello Analytics
                            
                                Python: Proper way to store list of strings in sqlite3 or mysql
                            
                                Python scripts stopped running on double-click in Windows
                            
                                alternative (faster) war to 3 nested for loop python
                            
                                Numpy: Assignment and Indexing as Matlab
                            
                                Improving performance of Cronbach Alpha code python numpy
                            
                                Removing BOM from gzip'ed CSV in Python
                            
                                Release hdf5 disk memory after table or node removal with pytables or pandas
                            
                                How to build a web service with one sandboxed Python (VM) per request

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With