I am trying to fill missing values with previous value but only if the previous value is repeating? Sample DF:
Index Columns
0 1978.0
1 1918.0
2 1918.0
3 1918.0
4 NaN
5 NaN
6 NaN
7 1853.0
8 1831.0
9 NaN
for the above dataframe replace NaN at index 4,5,6 with 1918.0 and leave NaN at index 8 as NaN.
Desired Output 1:
Index Columns
0 1978.0
1 1918.0
2 1918.0
3 1918.0
4 1918.0
5 1918.0
6 1918.0
7 1853.0
8 1831.0
9 NaN
And also it would be great if I could get number of instances it occurred like that out of all NaN value. ie; Sample DF has 4 NaN values in which 3 NaN values occured like that.
Desired Output 2:
Column_Name : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3
Please let me know if there is any way to get this.
Thanks
Here's a NumPy way working on the underlying array data for performance and convenience -
# Extract array data which being a view lets us modify the original
# dataframe later on just by modifying it
a = df.Columns.values
# Indices of NaN positions that also have repeating values preceding to them
idx = np.flatnonzero(np.r_[False,False,a[1:-1] == a[:-2]] & np.isnan(a))
# Finally assign previous values for all those places
a[idx] = a[idx-1]
You can select the dataframe with condition and ffill on that
cond = df['Columns'].shift(1) == df['Columns'].shift(2)
df.loc[cond] = df.loc[cond].ffill()
Columns
0 1978.0
1 1918.0
2 1918.0
3 1918.0
4 1918.0
6 1853.0
7 1831.0
8 NaN
Update: This handles the new test case
cond = (df.Columns.shift(1) == df.Columns.shift(2)) | (df.Columns.shift(-2).notnull())
df.loc[:] = df.fillna(df.loc[cond].ffill())
You get
Columns
0 1978.0
1 1918.0
2 1918.0
3 1918.0
4 1918.0
5 1918.0
6 1918.0
7 1853.0
8 1831.0
9 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With