I need to calculate the number of non-NaN elements in a numpy ndarray matrix. How would one efficiently do this in Python? Here is my simple code for achieving this:
import numpy as np def numberOfNonNans(data): count = 0 for i in data: if not np.isnan(i): count += 1 return count
Is there a built-in function for this in numpy? Efficiency is important because I'm doing Big Data analysis.
Thnx for any help!
In order to count the number of nan instances in the dataset, we can call np. isnan to return a mask of true / false depending on whether the data is nan. Then we can use the np. count_nonzero function to sum up the total.
Use count_nonzero() to count True elements in NumPy array Numpy module provides a function count_nonzero(arr, axis=None), which returns the count of non zero values in a given numpy array. When the value of axis argument is None, then it returns the count of non zero values in the complete array.
count() is a numpy library function that counts the total number of occurrences of a character in a string or the array.
np.count_nonzero(~np.isnan(data))
~
inverts the boolean matrix returned from np.isnan
.
np.count_nonzero
counts values that is not 0\false. .sum
should give the same result. But maybe more clearly to use count_nonzero
Testing speed:
In [23]: data = np.random.random((10000,10000)) In [24]: data[[np.random.random_integers(0,10000, 100)],:][:, [np.random.random_integers(0,99, 100)]] = np.nan In [25]: %timeit data.size - np.count_nonzero(np.isnan(data)) 1 loops, best of 3: 309 ms per loop In [26]: %timeit np.count_nonzero(~np.isnan(data)) 1 loops, best of 3: 345 ms per loop In [27]: %timeit data.size - np.isnan(data).sum() 1 loops, best of 3: 339 ms per loop
data.size - np.count_nonzero(np.isnan(data))
seems to barely be the fastest here. other data might give different relative speed results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With