Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use groupby.first() with transform function

Tags:

python

pandas

I would like to use the groupby.first() function to find the first non-null value of a group and transform that value to each row in the group.

I have tried the following code:

import pandas as pd
import numpy as np
raw_data = {'col1': ['a','a','a','b','b','b','b','b','b','c','c','c','c','c'],
            'col2': [np.nan,np.nan,6,0,2,0,8,2,2,3,0,0,4,5]}
df=pd.DataFrame(raw_data)
df['col3'] = df.groupby('col1')['col2'].transform(lambda x: x.first())
df

I would like to get a df that looks like this:

  col1 col2 col3
    a NaN   6
    a NaN   6
    a 6     6
    b 0     0
    b 2     0
    b 0     0
    b 8     0
    b 2     0
    b 2     0
    c 3     3
    c 0     3
    c 0     3
    c 4     3
    c 5     3

I get the following error: TypeError: first() missing 1 required positional argument: 'offset'

Interestingly, if I run the same code and just swap out first() for sum() then it returns the sum of each group for every row of that group. The first() function will not work. Why not? Any help would be greatly appreciated!

like image 347
Will Bachrach Avatar asked Aug 22 '19 23:08

Will Bachrach


1 Answers

With your lambda you are trying to use Series.first, which only makes sense for a Series with a DatetimeIndex.

You want GroupBy.first, which can be accessed with the named alias 'first'.

df['col3'] = df.groupby('col1')['col2'].transform('first')
like image 200
ALollz Avatar answered Sep 28 '22 20:09

ALollz