Remove lesser than K consecutive NaNs from pandas DataFrame

Question

I am working Time Series data. I am facing problem while removing consecutive NaNs less than or equal to threshold from a Data Frame column. I tried looking at some of the links like:

Identifying consecutive NaN's with pandas : Identifies where consecutive NaNs are present and what is count.

Pandas: run length of NaN holes : Outputs run Length encoding for NaNs

There are many more others along this lane, but none of them actually tells how can we remove them after identifying.

I found one similar solution but that is in R : How to remove more than 2 consecutive NA's in a column?

I want solution in Python.

So here is the example:

Here is my dataframe column:

If k = 3, my output should be:

How can I go about removing the consecutive NaNs less than or equal to some threshold (k).

cs95 · Accepted Answer

There are a few ways, but this is how I've done it:

Determine groups of consecutive numbers using a neat cumsum trick
Use groupby + transform to determine the size of each group
Identify groups of NaNs that are within the threshold
Filter them out with boolean indexing.

k = 3 
i = df.a.isnull()
m = ~(df.groupby(i.ne(i.shift()).cumsum().values).a.transform('size').le(k) & i)

df[m]

a
0   36.45
1   35.45
5   37.21
6   35.63
7   36.45
8   34.65
9   31.45
12  36.71
13  35.55
14    NaN
15    NaN
16    NaN
17    NaN
18  37.71

You can perform df = df[m]; df.reset_index(drop=True) step at the end if you want a monotonically increasing integer index.

Allen · Answer

You can create a indicator column to count the consecutive nans.

k = 3
(
df.groupby(pd.notna(df.a).cumsum())
.apply(lambda x: x.dropna() if pd.isna(x.a).sum() <= k else x)
.reset_index(drop=True)
)

Out[375]: 
        a
0   36.45
1   35.45
2   37.21
3   35.63
4   36.45
5   34.65
6   31.45
7   36.71
8   35.55
9     NaN
10    NaN
11    NaN
12    NaN
13  37.71

Remove lesser than K consecutive NaNs from pandas DataFrame

Tags:

Avani Sharma

2 Answers

cs95

Allen

Recent Activity

Donate For Us

Remove lesser than K consecutive NaNs from pandas DataFrame

Tags:

Avani Sharma

2 Answers

cs95

Allen

Related questions

Recent Activity

Donate For Us