I have a dataframe like the following:
import pandas as pd
data={'col1':[1,3,3,1,2,3,2,2, 1], 'col2':[np.nan, 1, np.nan, 1, np.nan, np.nan, np.nan, 2, np.nan]}
df=pd.DataFrame(data,columns=['col1', 'col2'])
print df
col1 col2
0 1 NaN
1 3 1.0
2 3 NaN
3 1 1.0
4 2 NaN
5 3 NaN
6 2 NaN
7 2 2.0
8 1 NaN
I am trying to make a third column that fills in the NaN vales in col2 if the value of col2 is equal to 1.0 or the row above in col2 is 1.0. The final dataframe would look like this:
col1 col2 col3
0 1 NaN NaN
1 3 1.0 1.0
2 3 NaN 1.0
3 1 1.0 1.0
4 2 NaN 1.0
5 3 NaN 1.0
6 2 NaN 1.0
7 2 2.0 2.0
8 1 NaN NaN
First approach I tried was:
df['col3'] = ((df['col2']== 1) | ((df['col2'].shift()== 1))).astype('int')
This leaves me with this dataframe:
col1 col2 col3
0 1 NaN 0
1 3 1.0 1
2 3 NaN 1
3 1 1.0 1
4 2 NaN 1
5 3 NaN 0
6 2 NaN 0
7 2 2.0 0
8 1 NaN 0
Which corrects the first instance of a missing value, but does not continue to fill missing values. I also tried using the np.where() function and I get the same results.
Is there a way to write this in pandas where it fixes multiple instances in a row?
You can use np.where by looking at where the forward-fill is equal to one, filling 1 where it's True, and falling back to the value of 'col2' when it's False:
df['col2'] = np.where(df['col2'].ffill() == 1, 1, df['col2'])
The resulting output:
col1 col2
0 1 NaN
1 3 1.0
2 3 1.0
3 1 1.0
4 2 1.0
5 3 1.0
6 2 1.0
7 2 2.0
8 1 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With