Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count distinct strings in rolling window include NaN using pandas

I would like to use rolling count with maximum value is 36 which need to include NaN value such as start with 0 if its NaN. I have dataframe that look like this:

Input:

val
NaN
 1
 1
NaN
 2
 1
 3
NaN
 5

Code:

b = a.rolling(36,min_periods=1).apply(lambda x: len(np.unique(x))).astype(int)

It gives me:

Val     count
NaN       1
 1        2
 1        2
NaN       3
 2        4
 1        4
 3        5
NaN       6
 5        7

Expected Output:

Val     count
NaN       0
 1        1
 1        1
NaN       1
 2        2
 1        2
 3        3
NaN       3
 5        4
like image 880
noishi Avatar asked Dec 08 '25 20:12

noishi


1 Answers

You can just filter out nan

df.val.rolling(36,min_periods=1).apply(lambda x: len(np.unique(x[~np.isnan(x)]))).fillna(0)
Out[35]: 
0    0.0
1    1.0
2    1.0
3    1.0
4    2.0
5    2.0
6    3.0
7    3.0
8    4.0
Name: val, dtype: float64

The reason why

np.unique([np.nan]*2)
Out[38]: array([nan, nan])

np.nan==np.nan
Out[39]: False
like image 147
BENY Avatar answered Dec 10 '25 09:12

BENY