Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if data is missing after two or more repeating values in pandas and replace missing value with previous value?

I am trying to fill missing values with previous value but only if the previous value is repeating? Sample DF:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4       NaN
5       NaN
6       NaN
7    1853.0
8    1831.0
9       NaN

for the above dataframe replace NaN at index 4,5,6 with 1918.0 and leave NaN at index 8 as NaN.

Desired Output 1:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4    1918.0
5    1918.0
6    1918.0
7    1853.0
8    1831.0
9       NaN

And also it would be great if I could get number of instances it occurred like that out of all NaN value. ie; Sample DF has 4 NaN values in which 3 NaN values occured like that.

Desired Output 2:

Column_Name  : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3

Please let me know if there is any way to get this.

Thanks

like image 773
Dr.Who Avatar asked Dec 13 '25 17:12

Dr.Who


2 Answers

Here's a NumPy way working on the underlying array data for performance and convenience -

# Extract array data which being a view lets us modify the original
# dataframe later on just by modifying it
a = df.Columns.values

# Indices of NaN positions that also have repeating values preceding to them
idx = np.flatnonzero(np.r_[False,False,a[1:-1] == a[:-2]] & np.isnan(a))

# Finally assign previous values for all those places
a[idx] = a[idx-1]
like image 139
Divakar Avatar answered Dec 16 '25 13:12

Divakar


You can select the dataframe with condition and ffill on that

cond = df['Columns'].shift(1) == df['Columns'].shift(2)
df.loc[cond] = df.loc[cond].ffill()

    Columns
0   1978.0
1   1918.0
2   1918.0
3   1918.0
4   1918.0
6   1853.0
7   1831.0
8   NaN

Update: This handles the new test case

cond = (df.Columns.shift(1) == df.Columns.shift(2)) | (df.Columns.shift(-2).notnull())
df.loc[:] = df.fillna(df.loc[cond].ffill())

You get

    Columns
0   1978.0
1   1918.0
2   1918.0
3   1918.0
4   1918.0
5   1918.0
6   1918.0
7   1853.0
8   1831.0
9   NaN
like image 31
Vaishali Avatar answered Dec 16 '25 14:12

Vaishali