How to calculate percentage of sparsity for a numpy array/matrix?

Tags:

I have the following 10 by 5 numpy array/matrix, which has a number of NaN values:

array([[  0.,   0.,   0.,   0.,   1.],
       [  1.,   1.,   0.,  nan,  nan],
       [  0.,  nan,   1.,  nan,  nan],
       [  1.,   1.,   1.,   1.,   0.],
       [  0.,   0.,   0.,   1.,   0.],
       [  0.,   0.,   0.,   0.,  nan],
       [ nan,  nan,   1.,   1.,   1.],
       [  0.,   1.,   0.,   1.,   0.],
       [  1.,   0.,   1.,   0.,   0.],
       [  0.,   1.,   0.,   0.,   0.]])

How does one measure exactly how sparse this array is? Is there a simply function in numpy for measuring the percentage of missing values?

749

asked Aug 01 '16 21:08

ShanZhengYang

3 Answers

Definition:

enter image description here

Code for a general case:

from numpy import array
from numpy import count_nonzero
import numpy as np

# create dense matrix
A = array([[1, 1, 0, 1, 0, 0], [1, 0, 2, 0, 0, 1], [99, 0, 0, 2, 0, 0]])

#If you have Nan
A = np.nan_to_num(A,0)

print(A)
#[[ 1  1  0  1  0  0]
# [ 1  0  2  0  0  1]
# [99  0  0  2  0  0]]

# calculate sparsity
sparsity = 1.0 - ( count_nonzero(A) / float(A.size) )
print(sparsity)

Results:

0.555555555556

answered Oct 22 '22 05:10

seralouk

np.isnan(a).sum()

gives the number of nan values, in this example 8.

np.prod(a.shape)

is the number of values, here 50. Their ratio should give the desired value.

In [1081]: np.isnan(a).sum()/np.prod(a.shape)
Out[1081]: 0.16

You might also find it useful to make a masked array from this

In [1085]: a_ma=np.ma.masked_invalid(a)
In [1086]: print(a_ma)
[[0.0 0.0 0.0 0.0 1.0]
 [1.0 1.0 0.0 -- --]
 [0.0 -- 1.0 -- --]
 [1.0 1.0 1.0 1.0 0.0]
 [0.0 0.0 0.0 1.0 0.0]
 [0.0 0.0 0.0 0.0 --]
 [-- -- 1.0 1.0 1.0]
 [0.0 1.0 0.0 1.0 0.0]
 [1.0 0.0 1.0 0.0 0.0]
 [0.0 1.0 0.0 0.0 0.0]]

The number of valid values then is:

In [1089]: a_ma.compressed().shape
Out[1089]: (42,)

answered Oct 22 '22 03:10

hpaulj

Measuring the percentage of missing values has already explained by 'hpaulj'.

I am taking the first part of your question, Assuming array has Zero's and Non-Zero's...

Sparsity refers to Zero values and density refers to Non-Zero values in array. Suppose your array is X, get count of non-zero values:

non_zero = np.count_nonzero(X)

total values in X:

total_val = np.product(X.shape)

Sparsity will be -

sparsity = (total_val - non_zero) / total_val

And Density will be -

density = non_zero / total_val

The sum of Sparsity and Density must equal to 100%...

answered Oct 22 '22 03:10

Arun Kumar Khattri

Related questions
                            
                                Creating dynamically named variables in a function in python 3 / Understanding exec / eval / locals in python 3
                            
                                How do I select an element in array column of a data frame?
                            
                                How to remove extra string "Line2D" in matplotlib legend
                            
                                'dict' object has no attribute 'read'
                            
                                flask how to get the HTTP_ORIGIN of a request
                            
                                Pass multiple args from bash into python
                            
                                Django filtering based on optional parameters
                            
                                How to get variable data from a class
                            
                                How to generate random integers with multiple ranges?
                            
                                Import Matplotlib without a display
                            
                                Python: read in an Array of json objects using json.loads()
                            
                                List comprehension vs set comprehension
                            
                                SSLError using pip install (to install tensorflow)
                            
                                How to get the sum of a list of numbers with recursion?
                            
                                Find all possible substrings beginning with characters from capturing group
                            
                                Normalizing rows of a matrix python
                            
                                Searching a sequence in a NumPy array
                            
                                Scraping of protected email
                            
                                Fill the outside of contours OpenCV
                            
                                Absolute difference of two NumPy arrays

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate percentage of sparsity for a numpy array/matrix?

Tags:

python

arrays

matrix

numpy

sparse-matrix