Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe groupby by column position

I have a function that does group by on a pandas dataframe. The problem is my dataframe can have variable number of columns. I want to aggregate: sum the last column by the first column. The name of the last column is different, but, the name of the first column is fixed.

How could I achieve the group by? I tried using iloc and by getting the column name of the last column using df.columns[-1], but, none of these tricks seem to work.

Are there any better ways to achieve this than changing the last column name to some common value?

like image 754
add787 Avatar asked Feb 08 '18 19:02

add787


2 Answers

df.groupby(df.columns[0])[df.columns[-1]].sum() should work.

Example:

df = pd.DataFrame({
    'a': [1,1,2,2],
    'b': [1,2,3,4]
})

df.groupby(df.columns[0])[df.columns[-1]].sum()
#a
#1    3
#2    7
#Name: b, dtype: int64
like image 62
Psidom Avatar answered Sep 28 '22 18:09

Psidom


Simply use Series selected by iloc, data borrowed by @Psidom:

s = df.iloc[:, -1].groupby(df.iloc[:, 0]).sum()
print (s)
a
1    3
2    7
Name: b, dtype: int64
like image 34
jezrael Avatar answered Sep 28 '22 20:09

jezrael