Forward fill only certain value

Tags:

I have an array which represents object states, where 0 - object is off, and 1 - object is on.

import pandas as pd
import numpy as np

s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan]
df = pd.DataFrame(s, columns=["s"])
df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN

I need to forward will only 0-values in it, like below.

>>> df_wanted
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

After browsing similar queations here, I just compare ffill-ed and bfill-ed values and assign back with a mask:

mask = (df.ffill() == 0) & (df.bfill() == 1)
df[mask] = 0
df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

But it won't help if any 0 value is not followed by 1. What could be more elegant solution that takes such cases into account?

292

asked May 22 '21 07:05

crayxt

Video Answer

2 Answers

mask = (df.ffill() == 0) should only be suffice to fulfill your usecase.

Firstly, df.ffill will propagate the last valid observation forward. So rows followed by 0 will be filled by 0s, and rows followed by 1 will be filled by 1s. Compare that to 0 to select rows with 0s only and use it as mask to get your final df.

Example: (Added a 0 and few NaNs to the end of your df)

>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN
11  NaN
12  0.0
13  NaN
14  NaN
15  NaN
>>> 
>>> 
>>> df[df.ffill() == 0] = 0
>>> df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN
11  NaN
12  0.0
13  0.0
14  0.0
15  0.0

answered Oct 18 '22 03:10

Ank

One way, maybe not much elegant but that works for you, would be to just ffill with everything and then pick from it where your original series was NaN and your ffilled series is 0.

sf = df.ffill().values[:, 0]
desired = np.where(np.isnan(s) & (sf==0), sf, s)

pandas has a where function too, I'm just more comfortable with numpy since it's more versatile.

answered Oct 18 '22 01:10

Sina Meftah

Related questions
                            
                                How to Deal with Lat/Lon Arrays with Multiple Dimensions?
                            
                                Preform aggregation(s) on multiindex columns
                            
                                Cannot call Python function from Javascript in Notebook
                            
                                Same random numbers in C++ as computed by Python3 numpy.random.rand
                            
                                Writing data from a Python List and a Dictionary to CSV
                            
                                How to implement Grad-CAM on a trained network
                            
                                Poetry could not find a pyproject.toml file in C:\
                            
                                How to serialise and deserialise complex POCO data structures in Python to/from JSON
                            
                                The wikipedia api seems to almost always get the word in question wrong
                            
                                Automatically simplify redundant arithmetic relations
                            
                                lask.cli.NoAppException: While importing "app", an ImportError was raised:
                            
                                Color percentage in image for Python using OpenCV
                            
                                Getting 403 when using Selenium to automate checkout process
                            
                                ImportError: Spatial indexes require either `rtree` or `pygeos` in geopanda but rtree is installed
                            
                                Pandas sort_value() issue. Wrong sorting integer when applied key parameter
                            
                                Scraping data from a dynamic web table
                            
                                str.encode() giving unexpected results
                            
                                How to fill the values in the list and convert it into the dataframe?
                            
                                Making a ML model scikit-learn compatible
                            
                                InvalidArgumentError: required broadcastable shapes at loc(unknown)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Forward fill only certain value

Tags:

python

pandas

numpy

ffill

crayxt

People also ask

Video Answer

2 Answers

Ank

Sina Meftah

Recent Activity

Donate For Us