Pandas dataframe group: sum one column, take first element from others

Question

I have a pandas dataframe

x = pd.DataFrame.from_dict({'row':[1, 1, 2, 2, 3, 3, 3], 'add': [1, 2, 3, 4, 5, 6, 7], 'take1': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'take2': ['11', '22', '33', '44', '55', '66', '77'], 'range': [100, 200, 300, 400, 500, 600, 700]})


   add  range  row take1 take2
0    1    100    1     a    11
1    2    200    1     b    22
2    3    300    2     c    33
3    4    400    2     d    44
4    5    500    3     e    55
5    6    600    3     f    66
6    7    700    3     g    77

I want to group it by the row column, then add up entries in add column, but take the first entry from take1 and take2, and select the min and max from range:

   add    row take1 take2  min_range   max_range
0    3      1     a    11    100        200
1    7      2     c    33    300        400
2    18     3     e    55    500        700

jezrael · Accepted Answer

Use DataFrameGroupBy.agg by dict, but then some cleaning is necessary, because get MultiIndex in columns:

#create a dictionary of column names and functions to apply to that column

d = {'add':'sum', 'take1':'first', 'take2':'first', 'range':['min','max']}

#group by the row column and apply the corresponding aggregation to each 
#column as specified in the dictionary d
df = x.groupby('row', as_index=False).agg(d)

#rename some columns
df = df.rename(columns={'first':'', 'sum':''})
df.columns = ['{0[0]}_{0[1]}'.format(x).strip('_') for x in df.columns] 
print (df)
   row take1  range_min  range_max take2  add
0    1     a        100        200    11    3
1    2     c        300        400    33    7
2    3     e        500        700    55   18

Details : Aggregate the columns based by the functions specified in the dictionary :

df = x.groupby('row', as_index=False).agg(d)

row range      take2 take1 add
        min  max first first sum
0   1   100  200    11     a   3
1   2   300  400    33     c   7
2   3   500  700    55     e  18

Replacing column names sum and first with '' will lead to

 row range      take2 take1 add
        min  max                
0   1   100  200    11     a   3
1   2   300  400    33     c   7
2   3   500  700    55     e  18

List comprehension on columns by using string formatters will get the desired column names. Assigning it to df.columns will get the desired output.

Pandas dataframe group: sum one column, take first element from others

Tags:

python

pandas

dataframe

group-by

Baron Yugovich

1 Answers

jezrael

Recent Activity

Donate For Us

Pandas dataframe group: sum one column, take first element from others

Tags:

python

pandas

dataframe

group-by

Baron Yugovich

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us