I have a dataframe like the following.
A = [{'ID':1, 'Period':1, 'Variable':21}, {'ID':1,'Period':2, 'Variable':12},
{'ID':2, 'Period':2, 'Variable':14}, {'ID':2, 'Period':3, 'Variable':18}]
df = pd.DataFrame(A)
I would essentially like to do an operation like df.groupby('ID').sum()
to get the sum of the Variable
column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.
How can I do this?
You can slice within each group to ignore the first row and call sum
:
In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())
Out[46]:
ID
1 12
2 18
Name: Variable, dtype: int64
If you want all the columns:
In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())
Out[47]:
ID Period Variable
ID
1 1 2 12
2 2 3 18
You can use pd.Series.duplicated
to ignore the first occurrence:
res = df[df['ID'].duplicated()].groupby('ID').sum()
print(res)
Period Variable
ID
1 2 12
2 3 18
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With