How to check if data is missing after two or more repeating values in pandas and replace missing value with previous value?

Question

I am trying to fill missing values with previous value but only if the previous value is repeating? Sample DF:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4       NaN
5       NaN
6       NaN
7    1853.0
8    1831.0
9       NaN

for the above dataframe replace NaN at index 4,5,6 with 1918.0 and leave NaN at index 8 as NaN.

Desired Output 1:

Index Columns
0    1978.0
1    1918.0
2    1918.0
3    1918.0
4    1918.0
5    1918.0
6    1918.0
7    1853.0
8    1831.0
9       NaN

And also it would be great if I could get number of instances it occurred like that out of all NaN value. ie; Sample DF has 4 NaN values in which 3 NaN values occured like that.

Desired Output 2:

Column_Name  : Columns
Total_NaN_count : 4
NaN_values_with_previous_elements_repeating : 3

Please let me know if there is any way to get this.

Thanks

Divakar · Accepted Answer

Here's a NumPy way working on the underlying array data for performance and convenience -

# Extract array data which being a view lets us modify the original
# dataframe later on just by modifying it
a = df.Columns.values

# Indices of NaN positions that also have repeating values preceding to them
idx = np.flatnonzero(np.r_[False,False,a[1:-1] == a[:-2]] & np.isnan(a))

# Finally assign previous values for all those places
a[idx] = a[idx-1]

Vaishali · Answer

You can select the dataframe with condition and ffill on that

cond = df['Columns'].shift(1) == df['Columns'].shift(2)
df.loc[cond] = df.loc[cond].ffill()

    Columns
0   1978.0
1   1918.0
2   1918.0
3   1918.0
4   1918.0
6   1853.0
7   1831.0
8   NaN

Update: This handles the new test case

cond = (df.Columns.shift(1) == df.Columns.shift(2)) | (df.Columns.shift(-2).notnull())
df.loc[:] = df.fillna(df.loc[cond].ffill())

You get

How to check if data is missing after two or more repeating values in pandas and replace missing value with previous value?

Tags:

python

python-3.x

pandas

numpy

python-2.7

Dr.Who

2 Answers

Divakar

Vaishali

Recent Activity

Donate For Us

How to check if data is missing after two or more repeating values in pandas and replace missing value with previous value?

Tags:

python

python-3.x

pandas

numpy

python-2.7

Dr.Who

2 Answers

Divakar

Vaishali

Related questions

Recent Activity

Donate For Us