I have a pandas dataframe
x = pd.DataFrame.from_dict({'row':[1, 1, 2, 2, 3, 3, 3], 'add': [1, 2, 3, 4, 5, 6, 7], 'take1': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'take2': ['11', '22', '33', '44', '55', '66', '77'], 'range': [100, 200, 300, 400, 500, 600, 700]})
add range row take1 take2
0 1 100 1 a 11
1 2 200 1 b 22
2 3 300 2 c 33
3 4 400 2 d 44
4 5 500 3 e 55
5 6 600 3 f 66
6 7 700 3 g 77
I want to group it by the row
column, then add up entries in add
column, but take the first entry from take1
and take2
, and select the min and max from range:
add row take1 take2 min_range max_range
0 3 1 a 11 100 200
1 7 2 c 33 300 400
2 18 3 e 55 500 700
Use DataFrameGroupBy.agg
by dict, but then some cleaning is necessary, because get MultiIndex
in columns:
#create a dictionary of column names and functions to apply to that column
d = {'add':'sum', 'take1':'first', 'take2':'first', 'range':['min','max']}
#group by the row column and apply the corresponding aggregation to each
#column as specified in the dictionary d
df = x.groupby('row', as_index=False).agg(d)
#rename some columns
df = df.rename(columns={'first':'', 'sum':''})
df.columns = ['{0[0]}_{0[1]}'.format(x).strip('_') for x in df.columns]
print (df)
row take1 range_min range_max take2 add
0 1 a 100 200 11 3
1 2 c 300 400 33 7
2 3 e 500 700 55 18
Details : Aggregate the columns based by the functions specified in the dictionary :
df = x.groupby('row', as_index=False).agg(d)
row range take2 take1 add min max first first sum 0 1 100 200 11 a 3 1 2 300 400 33 c 7 2 3 500 700 55 e 18
Replacing column names sum
and first
with ''
will lead to
row range take2 take1 add min max 0 1 100 200 11 a 3 1 2 300 400 33 c 7 2 3 500 700 55 e 18
List comprehension on columns by using string formatters will get the desired column names. Assigning it to df.columns
will get the desired output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With