I have a function that does group by on a pandas dataframe. The problem is my dataframe can have variable number of columns. I want to aggregate: sum the last column by the first column. The name of the last column is different, but, the name of the first column is fixed.
How could I achieve the group by? I tried using iloc and by getting the column name of the last column using df.columns[-1], but, none of these tricks seem to work.
Are there any better ways to achieve this than changing the last column name to some common value?
df.groupby(df.columns[0])[df.columns[-1]].sum()
should work.
Example:
df = pd.DataFrame({
'a': [1,1,2,2],
'b': [1,2,3,4]
})
df.groupby(df.columns[0])[df.columns[-1]].sum()
#a
#1 3
#2 7
#Name: b, dtype: int64
Simply use Series
selected by iloc
, data borrowed by @Psidom:
s = df.iloc[:, -1].groupby(df.iloc[:, 0]).sum()
print (s)
a
1 3
2 7
Name: b, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With