Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas backfill specific value

I have dataframe as such:

df = pd.DataFrame({'val': [np.nan,np.nan,np.nan,np.nan, 15, 1, 5, 2,np.nan, np.nan, np.nan, np.nan,np.nan,np.nan,2,23,5,12, np.nan np.nan, 3,4,5]})
df['name'] = ['a']*8 + ['b']*15

df

>>> 
    val name
0   NaN    a
1   NaN    a
2   NaN    a
3   NaN    a
4   15.0   a
5   1.0    a
6   5.0    a
7   2.0    a
8   NaN    b
9   NaN    b
10  NaN    b
11  NaN    b
12  NaN    b
13  NaN    b
14  2.0    b
15  23.0   b
16  5.0    b
17  12.0   b
18  NaN    b
19  NaN    b
20  3.0    b
21  4.0    b
22  5.0    b

For each name i want to backfill the prior 3 na spots with -1 so that I end up with

>>>
    val name
0   NaN     a
1   -1.0    a
2   -1.0    a
3   -1.0    a
4   15.0    a
5   1.0     a
6   5.0     a
7   2.0     a
8   NaN     b
9   NaN     b
10  NaN     b
11  -1.0    b
12  -1.0    b
13  -1.0    b
14  2.0     b
15  23.0    b
16  5.0     b
17  12.0    b
18  -1      b
19  -1      b
20  3.0     b
21  4.0     b
22  5.0     b

Note there can be multiple sections with NaN. If a section has less than 3 nans it will fill all of them (it backfills all up to 3).

like image 883
RSHAP Avatar asked Jun 07 '18 20:06

RSHAP


People also ask

How do you backfill in pandas?

bfill() is used to backward fill the missing values in the dataset. It will backward fill the NaN values that are present in the pandas dataframe.

How do I fill a specific null in pandas?

Pandas DataFrame fillna() Method The fillna() method replaces the NULL values with a specified value. The fillna() method returns a new DataFrame object unless the inplace parameter is set to True , in that case the fillna() method does the replacing in the original DataFrame instead.

What is bfill in Python?

Definition and Usage. The bfill() method replaces the NULL values with the values from the next row (or next column, if the axis parameter is set to 'columns' ).


1 Answers

You can using first_valid_index, return the first not null value of each group then assign the -1 in by using the loc

idx=df.groupby('name').val.apply(lambda x : x.first_valid_index())
for x in idx:
    df.loc[x - 3:x - 1, 'val'] = -1

df
Out[51]: 
     val name
0    NaN    a
1   -1.0    a
2   -1.0    a
3   -1.0    a
4   15.0    a
5    1.0    a
6    5.0    a
7    2.0    a
8    NaN    b
9    NaN    b
10   NaN    b
11  -1.0    b
12  -1.0    b
13  -1.0    b
14   2.0    b
15  23.0    b
16   5.0    b
17  12.0    b

Update

s=df.groupby('name').val.bfill(limit=3)
s.loc[s.notnull()&df.val.isnull()]=-1
s
Out[59]: 
0      NaN
1     -1.0
2     -1.0
3     -1.0
4     15.0
5      1.0
6      5.0
7      2.0
8      NaN
9      NaN
10     NaN
11    -1.0
12    -1.0
13    -1.0
14     2.0
15    23.0
16     5.0
17    12.0
18     NaN
19    -1.0
20    -1.0
21    -1.0
22     3.0
23     4.0
24     5.0
Name: val, dtype: float64
like image 149
BENY Avatar answered Oct 18 '22 06:10

BENY