I think I'm missing something basic conceptually, but I'm not able to find the answer in the docs.
>>> df=pd.DataFrame({'a':[1,1,2,2,3,3], 'b':[5,np.nan, 6, np.nan, np.nan, np.nan]})
>>> df
a b
0 1 5.0
1 1 NaN
2 2 6.0
3 2 NaN
4 3 NaN
5 3 NaN
Using ffill() and then bfill():
>>> df.groupby('a')['b'].ffill().bfill()
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Using bfill() and then ffill():
>>> df.groupby('a')['b'].bfill().ffill()
0 5.0
1 5.0
2 6.0
3 6.0
4 6.0
5 6.0
Doesn't the second way break the groupings? Will the first way always make sure that the values are filled in only with other values in that group?
I think you need:
print (df.groupby('a')['b'].apply(lambda x: x.ffill().bfill()))
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Name: b, dtype: float64
print (df.groupby('a')['b'].apply(lambda x: x.bfill().ffill()))
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Name: b, dtype: float64
because in your sample only first ffill
or bfill
is DataFrameGroupBy.ffill
or DataFrameGroupBy.bfill
, second is working with output Series
. So it break groups, because Series
has no groups.
print (df.groupby('a')['b'].ffill())
0 5.0
1 5.0
2 6.0
3 6.0
4 NaN
5 NaN
Name: b, dtype: float64
print (df.groupby('a')['b'].bfill())
0 5.0
1 NaN
2 6.0
3 NaN
4 NaN
5 NaN
Name: b, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With