(pandas) Why does .bfill().ffill() act differently than ffill().bfill() on groups?

Question

I think I'm missing something basic conceptually, but I'm not able to find the answer in the docs.

>>> df=pd.DataFrame({'a':[1,1,2,2,3,3], 'b':[5,np.nan, 6, np.nan, np.nan, np.nan]})
>>> df
   a    b
0  1  5.0
1  1  NaN
2  2  6.0
3  2  NaN
4  3  NaN
5  3  NaN

Using ffill() and then bfill():

>>> df.groupby('a')['b'].ffill().bfill()
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN

Using bfill() and then ffill():

>>> df.groupby('a')['b'].bfill().ffill()
0    5.0
1    5.0
2    6.0
3    6.0
4    6.0
5    6.0

Doesn't the second way break the groupings? Will the first way always make sure that the values are filled in only with other values in that group?

jezrael · Accepted Answer

I think you need:

print (df.groupby('a')['b'].apply(lambda x: x.ffill().bfill()))
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN
Name: b, dtype: float64

print (df.groupby('a')['b'].apply(lambda x: x.bfill().ffill()))
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN
Name: b, dtype: float64

because in your sample only first ffill or bfill is DataFrameGroupBy.ffill or DataFrameGroupBy.bfill, second is working with output Series. So it break groups, because Series has no groups.

print (df.groupby('a')['b'].ffill())
0    5.0
1    5.0
2    6.0
3    6.0
4    NaN
5    NaN
Name: b, dtype: float64

print (df.groupby('a')['b'].bfill())
0    5.0
1    NaN
2    6.0
3    NaN
4    NaN
5    NaN
Name: b, dtype: float64

(pandas) Why does .bfill().ffill() act differently than ffill().bfill() on groups?

Tags:

pandas

group-by

pandas-groupby

yobogoya

1 Answers

jezrael

Recent Activity

Donate For Us

(pandas) Why does .bfill().ffill() act differently than ffill().bfill() on groups?

Tags:

pandas

group-by

pandas-groupby

yobogoya

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us