Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum a column by ID, but skip the first instance?

I have a dataframe like the following.

A = [{'ID':1, 'Period':1, 'Variable':21}, {'ID':1,'Period':2, 'Variable':12}, 
      {'ID':2, 'Period':2, 'Variable':14}, {'ID':2, 'Period':3, 'Variable':18}]

df = pd.DataFrame(A)

I would essentially like to do an operation like df.groupby('ID').sum() to get the sum of the Variable column, but I need to skip the first period observed for a particular ID. So, for ID=1, I am dropping the observation at period 1, but for ID=2, I am dropping the observation at period 2.

How can I do this?

like image 749
Pburg Avatar asked Aug 08 '18 10:08

Pburg


2 Answers

You can slice within each group to ignore the first row and call sum:

In[46]:
df.groupby('ID')['Variable'].apply(lambda x: x.iloc[1:].sum())

Out[46]: 
ID
1    12
2    18
Name: Variable, dtype: int64

If you want all the columns:

In[47]:
df.groupby('ID').apply(lambda x: x.iloc[1:].sum())

Out[47]: 
    ID  Period  Variable
ID                      
1    1       2        12
2    2       3        18
like image 121
EdChum Avatar answered Nov 06 '22 13:11

EdChum


You can use pd.Series.duplicated to ignore the first occurrence:

res = df[df['ID'].duplicated()].groupby('ID').sum()

print(res)

    Period  Variable
ID                  
1        2        12
2        3        18
like image 2
jpp Avatar answered Nov 06 '22 15:11

jpp