I have a data frame as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id' : range(1,9),
'code' : ['one', 'one', 'two', 'three',
'two', 'three', 'one', 'two'],
'colour': ['black', 'white','white','white',
'black', 'black', 'white', 'white'],
'amount' : np.random.randn(8)}, columns= ['id','code','colour','amount'])
I want to be able to group the id
s by code
and colour
and then sort them with respect to amount
. I know how to groupby()
:
df.groupby(['code','colour']).head(5)
id code colour amount
code colour
one black 0 1 one black -0.117307
white 1 2 one white 1.653216
6 7 one white 0.817205
three black 5 6 three black 0.567162
white 3 4 three white 0.579074
two black 4 5 two black -1.683988
white 2 3 two white -0.457722
7 8 two white -1.277020
However, my desired output is as below, where I have two columns: 1.code/colour
contains the key strings and 2.id:amount
contains id
- amount
tuples sorted in descending order wrt amount
:
code/colour id:amount
one/black {1:-0.117307}
one/white {2:1.653216, 7:0.817205}
three/black {6:0.567162}
three/white {4:0.579074}
two/black {5:-1.683988}
two/white {3:-0.457722, 8:-1.277020}
How can I transform the DataFrameGroupBy
object displayed above to my desired format? Or, shall I not use groupby()
in the first place?
EDIT: Although not in the specified format, the code below kind of gives me the functionality I want:
groups = dict(list(df.groupby(['code','colour'])))
groups['one','white']
id code colour amount
1 2 one white 1.331766
6 7 one white 0.808739
How can I reduce the groups to only include the id
and amount
column?
groupby. DataFrameGroupBy. transform. Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values.
Pandas Series: transform() functionThe transform() function is used to call function on self producing a Series with transformed values and that has the same axis length as self. Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series. apply.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
First, groupby code and colour and then apply a customized function to format id and amount:
df = df.groupby(['code', 'colour']).apply(lambda x:x.set_index('id').to_dict('dict')['amount'])
And then modify the index:
df.index = ['/'.join(i) for i in df.index]
It will return a series, you can convert it back to DataFrame by:
df = df.reset_index()
Finally, add the column names by:
df.columns=['code/colour','id:amount']
Result:
In [105]: df
Out[105]:
code/colour id:amount
0 one/black {1: 0.392264412544}
1 one/white {2: 2.13950686015, 7: -0.393002947047}
2 three/black {6: -2.0766612539}
3 three/white {4: -1.18058561325}
4 two/black {5: -1.51959565941}
5 two/white {8: -1.7659863039, 3: -0.595666853895}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With