Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate percentage of sparsity for a numpy array/matrix?

I have the following 10 by 5 numpy array/matrix, which has a number of NaN values:

array([[  0.,   0.,   0.,   0.,   1.],
       [  1.,   1.,   0.,  nan,  nan],
       [  0.,  nan,   1.,  nan,  nan],
       [  1.,   1.,   1.,   1.,   0.],
       [  0.,   0.,   0.,   1.,   0.],
       [  0.,   0.,   0.,   0.,  nan],
       [ nan,  nan,   1.,   1.,   1.],
       [  0.,   1.,   0.,   1.,   0.],
       [  1.,   0.,   1.,   0.,   0.],
       [  0.,   1.,   0.,   0.,   0.]])

How does one measure exactly how sparse this array is? Is there a simply function in numpy for measuring the percentage of missing values?

like image 749
ShanZhengYang Avatar asked Aug 01 '16 21:08

ShanZhengYang


People also ask

How do I use NumPy functions with sparse matrices?

If you do want to apply a NumPy function to these matrices, first check if SciPy has its own implementation for the given sparse matrix class, or convert the sparse matrix to a NumPy array (e.g., using the toarray () method of the class) first before applying the method.

How to calculate the sparsity of a sparse matrix?

If A_sparse is a sparse matrix, then correct expression is sparsity = 1.0 - ( A_sparse.count_nonzero () / float (A_sparse.toarray ().size) ). Using float (A_sparse.size) would give incorrect sparsity of 0 for all sparse matrices. Actually float (A.toarray ().size) and float (A.size) is not same if A is a sparse matrix.

How to calculate percentile of an array in NumPy?

Numpy Percentile using axis = 0 in 2-D array We will be using axis = 0 in a 2-D array for calculating the percentile of the array by taking the input array. Here firstly, we have imported the numpy module in python as np.

How do I convert a NumPy matrix to an array?

We can see that the NumPy matrix has been converted to an array with 15 values. We can confirm that it is NumPy array by using the type () function: It is indeed a NumPy array. The following code shows how to use the ravel () function to convert a NumPy matrix to an array:


3 Answers

Definition:

enter image description here

Code for a general case:

from numpy import array
from numpy import count_nonzero
import numpy as np

# create dense matrix
A = array([[1, 1, 0, 1, 0, 0], [1, 0, 2, 0, 0, 1], [99, 0, 0, 2, 0, 0]])

#If you have Nan
A = np.nan_to_num(A,0)

print(A)
#[[ 1  1  0  1  0  0]
# [ 1  0  2  0  0  1]
# [99  0  0  2  0  0]]

# calculate sparsity
sparsity = 1.0 - ( count_nonzero(A) / float(A.size) )
print(sparsity)

Results:

0.555555555556
like image 57
seralouk Avatar answered Oct 22 '22 05:10

seralouk


np.isnan(a).sum()

gives the number of nan values, in this example 8.

np.prod(a.shape)

is the number of values, here 50. Their ratio should give the desired value.

In [1081]: np.isnan(a).sum()/np.prod(a.shape)
Out[1081]: 0.16

You might also find it useful to make a masked array from this

In [1085]: a_ma=np.ma.masked_invalid(a)
In [1086]: print(a_ma)
[[0.0 0.0 0.0 0.0 1.0]
 [1.0 1.0 0.0 -- --]
 [0.0 -- 1.0 -- --]
 [1.0 1.0 1.0 1.0 0.0]
 [0.0 0.0 0.0 1.0 0.0]
 [0.0 0.0 0.0 0.0 --]
 [-- -- 1.0 1.0 1.0]
 [0.0 1.0 0.0 1.0 0.0]
 [1.0 0.0 1.0 0.0 0.0]
 [0.0 1.0 0.0 0.0 0.0]]

The number of valid values then is:

In [1089]: a_ma.compressed().shape
Out[1089]: (42,)
like image 39
hpaulj Avatar answered Oct 22 '22 03:10

hpaulj


Measuring the percentage of missing values has already explained by 'hpaulj'.

I am taking the first part of your question, Assuming array has Zero's and Non-Zero's...

Sparsity refers to Zero values and density refers to Non-Zero values in array. Suppose your array is X, get count of non-zero values:

non_zero = np.count_nonzero(X)

total values in X:

total_val = np.product(X.shape)

Sparsity will be -

sparsity = (total_val - non_zero) / total_val

And Density will be -

density = non_zero / total_val

The sum of Sparsity and Density must equal to 100%...

like image 41
Arun Kumar Khattri Avatar answered Oct 22 '22 03:10

Arun Kumar Khattri