What is the best way to find the maximum number of consecutive repeated nan in a numpy array?
Examples:
from numpy import nan
Input 1: [nan, nan, nan, 0.16, 1, 0.16, 0.9999, 0.0001, 0.16, 0.101, nan, 0.16]
Output 1: 3
Input 2: [nan, nan, 2, 1, 1, nan, nan, nan, nan, 0.101, nan, 0.16]
Output 2: 4
Here's one approach -
def max_repeatedNaNs(a):
    # Mask of NaNs
    mask = np.concatenate(([False],np.isnan(a),[False]))
    if ~mask.any():
        return 0
    else:
        # Count of NaNs in each NaN group. Then, get max count as o/p.
        c = np.flatnonzero(mask[1:] < mask[:-1]) - \
            np.flatnonzero(mask[1:] > mask[:-1])
        return c.max()
Here's an improved version -
def max_repeatedNaNs_v2(a):
    mask = np.concatenate(([False],np.isnan(a),[False]))
    if ~mask.any():
        return 0
    else:
        idx = np.nonzero(mask[1:] != mask[:-1])[0]
        return (idx[1::2] - idx[::2]).max()
Benchmarking in response to @pltrdy's comment -
In [77]: a = np.random.rand(10000)
In [78]: a[np.random.choice(range(len(a)),size=1000,replace=0)] = np.nan
In [79]: %timeit contiguous_NaN(a) #@pltrdy's solution
100 loops, best of 3: 15.8 ms per loop
In [80]: %timeit max_repeatedNaNs(a)
10000 loops, best of 3: 103 µs per loop
In [81]: %timeit max_repeatedNaNs_v2(a)
10000 loops, best of 3: 86.4 µs per loop
                        I don't know if you have numba but it's very handy (and fast) for such exceptional problems:
import numba as nb
import math
@nb.njit   # also works without but then it's several orders of magnitudes slower
def max_consecutive_nan(arr):
    max_ = 0
    current = 0
    idx = 0
    while idx < arr.size:
        while idx < arr.size and math.isnan(arr[idx]):
            current += 1
            idx += 1
        if current > max_:
            max_ = current
        current = 0
        idx += 1
    return max_
For your examples:
>>> from numpy import nan
>>> max_consecutive_nan(np.array([nan, nan, 2, 1, 1, nan, nan, nan, nan, 0.101, nan, 0.16]))
4
>>> max_consecutive_nan(np.array([nan, nan, nan, 0.16, 1, 0.16, 0.9999, 0.0001, 0.16, 0.101, nan, 0.16]))
3
>>> max_consecutive_nan(np.array([0.16, 0.16, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]))
22
Using the benchmark proposed by @Divarkar and ordered by performance (the complete code for the benchmarks can be found in this gist):
arr = np.random.rand(10000)
arr[np.random.choice(range(len(arr)),size=1000,replace=0)] = np.nan
%timeit mine(arr)         # 10000 loops, best of 3: 67.7 µs per loop
%timeit Divakar_v2(arr)   # 1000 loops, best of 3: 196 µs per loop
%timeit Divakar(arr)      # 1000 loops, best of 3: 252 µs per loop
%timeit Tagc(arr)         # 100 loops, best of 3: 6.92 ms per loop
%timeit Kasramvd(arr)     # 10 loops, best of 3: 38.2 ms per loop
%timeit pltrdy(arr)       # 10 loops, best of 3: 70.9 ms per loop
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With