Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy mean of nonzero values

Tags:

I have a matrix of size N*M and I want to find the mean value for each row. The values are from 1 to 5 and entries that do not have any value are set to 0. However, when I want to find the mean using the following method, it gives me the wrong mean as it also counts the entries that have value of 0.

matrix_row_mean= matrix.mean(axis=1) 

How can I get the mean of only nonzero values?

like image 982
HimanAB Avatar asked Jul 23 '16 13:07

HimanAB


People also ask

How do you find nonzero elements in Numpy?

nonzero() function is used to Compute the indices of the elements that are non-zero. It returns a tuple of arrays, one for each dimension of arr, containing the indices of the non-zero elements in that dimension. The corresponding non-zero values in the array can be obtained with arr[nonzero(arr)] .

How do you count non-zero elements in Python?

1 Answer. Numpy's function count_nonzero() returns the count of non-zero elements in the input array. You can use it for the sparse matrix too. You can also nonzero() function to count the non-zero elements.


2 Answers

Get the count of non-zeros in each row and use that for averaging the summation along each row. Thus, the implementation would look something like this -

np.true_divide(matrix.sum(1),(matrix!=0).sum(1)) 

If you are on an older version of NumPy, you can use float conversion of the count to replace np.true_divide, like so -

matrix.sum(1)/(matrix!=0).sum(1).astype(float) 

Sample run -

In [160]: matrix Out[160]:  array([[0, 0, 1, 0, 2],        [1, 0, 0, 2, 0],        [0, 1, 1, 0, 0],        [0, 2, 2, 2, 2]])  In [161]: np.true_divide(matrix.sum(1),(matrix!=0).sum(1)) Out[161]: array([ 1.5,  1.5,  1. ,  2. ]) 

Another way to solve the problem would be to replace zeros with NaNs and then use np.nanmean, which would ignore those NaNs and in effect those original zeros, like so -

np.nanmean(np.where(matrix!=0,matrix,np.nan),1) 

From performance point of view, I would recommend the first approach.

like image 174
Divakar Avatar answered Sep 21 '22 02:09

Divakar


I will detail here the more general solution that uses a masked array. To illustrate the details let's create an lower triangular matrix with only ones:

matrix = np.tril(np.ones((5, 5)), 0) 

If you the terminology above is not clear this matrix looks like this:

  [[ 1.,  0.,  0.,  0.,  0.],    [ 1.,  1.,  0.,  0.,  0.],    [ 1.,  1.,  1.,  0.,  0.],    [ 1.,  1.,  1.,  1.,  0.],    [ 1.,  1.,  1.,  1.,  1.]] 

Now, we want our function to return an average of 1 for each of rows. Or in other words that the mean over the axis 1 is equal to a vector of five ones. In order to achieve this we created a masked matrix where the entries whose values are zero are considered invalid. This can be achieved withnp.ma.masked_equal:

masked = np.ma.masked_equal(matrix, 0) 

Finally we perform numpy operations in this array that will systematically ignore the masked elements (the 0's). With this in mind we obtain the desired result by:

masked.mean(axis=1) 

This should produce a vector whose entries are only ones.


In more detail the output of np.ma.masked_equal(matrix, 0) should look like this:

masked_array(data =  [[1.0 -- -- -- --]  [1.0 1.0 -- -- --]  [1.0 1.0 1.0 -- --]  [1.0 1.0 1.0 1.0 --]  [1.0 1.0 1.0 1.0 1.0]],              mask =  [[False  True  True  True  True]  [False False  True  True  True]  [False False False  True  True]  [False False False False  True]  [False False False False False]],        fill_value = 0.0) 

This indicates that eh values on -- are considered invalid. This is also shown in the mask attribute of the masked arrays as True which indicates that IT IS an invalid element and therefore should be ignored.

Finally the output of the mean operation on this array should is:

masked_array(data = [1.0 1.0 1.0 1.0 1.0],              mask = [False False False False False],        fill_value = 1e+20) 
like image 21
Heberto Mayorquin Avatar answered Sep 21 '22 02:09

Heberto Mayorquin