Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(pandas) Why does .bfill().ffill() act differently than ffill().bfill() on groups?

I think I'm missing something basic conceptually, but I'm not able to find the answer in the docs.

>>> df=pd.DataFrame({'a':[1,1,2,2,3,3], 'b':[5,np.nan, 6, np.nan, np.nan, np.nan]})
>>> df
   a    b
0  1  5.0
1  1  NaN
2  2  6.0
3  2  NaN
4  3  NaN
5  3  NaN

Using ffill() and then bfill():

>>> df.groupby('a')['b'].ffill().bfill()
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN

Using bfill() and then ffill():

>>> df.groupby('a')['b'].bfill().ffill()
0    5.0
1    5.0
2    6.0
3    6.0
4    6.0
5    6.0

Doesn't the second way break the groupings? Will the first way always make sure that the values are filled in only with other values in that group?

like image 442
yobogoya Avatar asked Dec 24 '22 17:12

yobogoya


1 Answers

I think you need:

print (df.groupby('a')['b'].apply(lambda x: x.ffill().bfill()))
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN
Name: b, dtype: float64

print (df.groupby('a')['b'].apply(lambda x: x.bfill().ffill()))
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN
Name: b, dtype: float64

because in your sample only first ffill or bfill is DataFrameGroupBy.ffill or DataFrameGroupBy.bfill, second is working with output Series. So it break groups, because Series has no groups.

print (df.groupby('a')['b'].ffill())
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN
Name: b, dtype: float64

print (df.groupby('a')['b'].bfill())
0    5.0
1    NaN
2    6.0
3    NaN
4    NaN
5    NaN
Name: b, dtype: float64
like image 89
jezrael Avatar answered Dec 28 '22 08:12

jezrael