I have an array which represents object states, where 0 - object is off, and 1 - object is on.
import pandas as pd
import numpy as np
s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan]
df = pd.DataFrame(s, columns=["s"])
df
s
0 NaN
1 0.0
2 NaN
3 NaN
4 1.0
5 NaN
6 NaN
7 0.0
8 NaN
9 1.0
10 NaN
I need to forward will only 0-values in it, like below.
>>> df_wanted
s
0 NaN
1 0.0
2 0.0
3 0.0
4 1.0
5 NaN
6 NaN
7 0.0
8 0.0
9 1.0
10 NaN
After browsing similar queations here, I just compare ffill
-ed and bfill
-ed values and assign back with a mask:
mask = (df.ffill() == 0) & (df.bfill() == 1)
df[mask] = 0
df
s
0 NaN
1 0.0
2 0.0
3 0.0
4 1.0
5 NaN
6 NaN
7 0.0
8 0.0
9 1.0
10 NaN
But it won't help if any 0 value is not followed by 1. What could be more elegant solution that takes such cases into account?
pandas. DataFrame. fillna() method is used to fill column (one or multiple columns) contains NA/NaN/None with 0, empty, blank or any specified values e.t.c. NaN is considered a missing value.
The ffill() method replaces the NULL values with the value from the previous row (or previous column, if the axis parameter is set to 'columns' ).
Forward filling and backward filling are two approaches to fill missing values. Forward filling means fill missing values with previous data. Backward filling means fill missing values with next data point.
bfill() is used to backward fill the missing values in the dataset. It will backward fill the NaN values that are present in the pandas dataframe.
mask = (df.ffill() == 0)
should only be suffice to fulfill your usecase.
Firstly, df.ffill
will propagate the last valid observation forward. So rows followed by 0
will be filled by 0s
, and rows followed by 1
will be filled by 1s
. Compare that to 0
to select rows with 0s
only and use it as mask to get your final df.
Example: (Added a 0 and few NaNs to the end of your df)
>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
s
0 NaN
1 0.0
2 NaN
3 NaN
4 1.0
5 NaN
6 NaN
7 0.0
8 NaN
9 1.0
10 NaN
11 NaN
12 0.0
13 NaN
14 NaN
15 NaN
>>>
>>>
>>> df[df.ffill() == 0] = 0
>>> df
s
0 NaN
1 0.0
2 0.0
3 0.0
4 1.0
5 NaN
6 NaN
7 0.0
8 0.0
9 1.0
10 NaN
11 NaN
12 0.0
13 0.0
14 0.0
15 0.0
One way, maybe not much elegant but that works for you, would be to just ffill with everything and then pick from it where your original series was NaN and your ffilled series is 0.
sf = df.ffill().values[:, 0]
desired = np.where(np.isnan(s) & (sf==0), sf, s)
pandas has a where
function too, I'm just more comfortable with numpy since it's more versatile.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With