Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of non-NaN elements in a numpy ndarray in Python

I need to calculate the number of non-NaN elements in a numpy ndarray matrix. How would one efficiently do this in Python? Here is my simple code for achieving this:

import numpy as np  def numberOfNonNans(data):     count = 0     for i in data:         if not np.isnan(i):             count += 1     return count  

Is there a built-in function for this in numpy? Efficiency is important because I'm doing Big Data analysis.

Thnx for any help!

like image 293
jjepsuomi Avatar asked Feb 14 '14 11:02

jjepsuomi


People also ask

How do you count non NaN values in Numpy array?

In order to count the number of nan instances in the dataset, we can call np. isnan to return a mask of true / false depending on whether the data is nan. Then we can use the np. count_nonzero function to sum up the total.

How do you count the number of elements in Numpy Ndarray?

Use count_nonzero() to count True elements in NumPy array Numpy module provides a function count_nonzero(arr, axis=None), which returns the count of non zero values in a given numpy array. When the value of axis argument is None, then it returns the count of non zero values in the complete array.

Which function is used to counts the number of elements in Numpy array?

count() is a numpy library function that counts the total number of occurrences of a character in a string or the array.


1 Answers

np.count_nonzero(~np.isnan(data)) 

~ inverts the boolean matrix returned from np.isnan.

np.count_nonzero counts values that is not 0\false. .sum should give the same result. But maybe more clearly to use count_nonzero

Testing speed:

In [23]: data = np.random.random((10000,10000))  In [24]: data[[np.random.random_integers(0,10000, 100)],:][:, [np.random.random_integers(0,99, 100)]] = np.nan  In [25]: %timeit data.size - np.count_nonzero(np.isnan(data)) 1 loops, best of 3: 309 ms per loop  In [26]: %timeit np.count_nonzero(~np.isnan(data)) 1 loops, best of 3: 345 ms per loop  In [27]: %timeit data.size - np.isnan(data).sum() 1 loops, best of 3: 339 ms per loop 

data.size - np.count_nonzero(np.isnan(data)) seems to barely be the fastest here. other data might give different relative speed results.

like image 95
M4rtini Avatar answered Oct 09 '22 00:10

M4rtini