Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forward fill only certain value

I have an array which represents object states, where 0 - object is off, and 1 - object is on.

import pandas as pd
import numpy as np

s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan]
df = pd.DataFrame(s, columns=["s"])
df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN

I need to forward will only 0-values in it, like below.

>>> df_wanted
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

After browsing similar queations here, I just compare ffill-ed and bfill-ed values and assign back with a mask:

mask = (df.ffill() == 0) & (df.bfill() == 1)
df[mask] = 0
df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

But it won't help if any 0 value is not followed by 1. What could be more elegant solution that takes such cases into account?

like image 292
crayxt Avatar asked May 22 '21 07:05

crayxt


People also ask

How do I fill a specific column in pandas?

pandas. DataFrame. fillna() method is used to fill column (one or multiple columns) contains NA/NaN/None with 0, empty, blank or any specified values e.t.c. NaN is considered a missing value.

What is Ffill method?

The ffill() method replaces the NULL values with the value from the previous row (or previous column, if the axis parameter is set to 'columns' ).

What is forward fill and backward fill?

Forward filling and backward filling are two approaches to fill missing values. Forward filling means fill missing values with previous data. Backward filling means fill missing values with next data point.

What is backward fill in pandas?

bfill() is used to backward fill the missing values in the dataset. It will backward fill the NaN values that are present in the pandas dataframe.


Video Answer


2 Answers

mask = (df.ffill() == 0) should only be suffice to fulfill your usecase.

Firstly, df.ffill will propagate the last valid observation forward. So rows followed by 0 will be filled by 0s, and rows followed by 1 will be filled by 1s. Compare that to 0 to select rows with 0s only and use it as mask to get your final df.

Example: (Added a 0 and few NaNs to the end of your df)

>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN
11  NaN
12  0.0
13  NaN
14  NaN
15  NaN
>>> 
>>> 
>>> df[df.ffill() == 0] = 0
>>> df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN
11  NaN
12  0.0
13  0.0
14  0.0
15  0.0
like image 55
Ank Avatar answered Oct 18 '22 03:10

Ank


One way, maybe not much elegant but that works for you, would be to just ffill with everything and then pick from it where your original series was NaN and your ffilled series is 0.

sf = df.ffill().values[:, 0]
desired = np.where(np.isnan(s) & (sf==0), sf, s)

pandas has a where function too, I'm just more comfortable with numpy since it's more versatile.

like image 3
Sina Meftah Avatar answered Oct 18 '22 01:10

Sina Meftah