I have the following 10 by 5 numpy array/matrix, which has a number of NaN
values:
array([[ 0., 0., 0., 0., 1.],
[ 1., 1., 0., nan, nan],
[ 0., nan, 1., nan, nan],
[ 1., 1., 1., 1., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., nan],
[ nan, nan, 1., 1., 1.],
[ 0., 1., 0., 1., 0.],
[ 1., 0., 1., 0., 0.],
[ 0., 1., 0., 0., 0.]])
How does one measure exactly how sparse this array is? Is there a simply function in numpy for measuring the percentage of missing values?
If you do want to apply a NumPy function to these matrices, first check if SciPy has its own implementation for the given sparse matrix class, or convert the sparse matrix to a NumPy array (e.g., using the toarray () method of the class) first before applying the method.
If A_sparse is a sparse matrix, then correct expression is sparsity = 1.0 - ( A_sparse.count_nonzero () / float (A_sparse.toarray ().size) ). Using float (A_sparse.size) would give incorrect sparsity of 0 for all sparse matrices. Actually float (A.toarray ().size) and float (A.size) is not same if A is a sparse matrix.
Numpy Percentile using axis = 0 in 2-D array We will be using axis = 0 in a 2-D array for calculating the percentile of the array by taking the input array. Here firstly, we have imported the numpy module in python as np.
We can see that the NumPy matrix has been converted to an array with 15 values. We can confirm that it is NumPy array by using the type () function: It is indeed a NumPy array. The following code shows how to use the ravel () function to convert a NumPy matrix to an array:
Definition:
from numpy import array
from numpy import count_nonzero
import numpy as np
# create dense matrix
A = array([[1, 1, 0, 1, 0, 0], [1, 0, 2, 0, 0, 1], [99, 0, 0, 2, 0, 0]])
#If you have Nan
A = np.nan_to_num(A,0)
print(A)
#[[ 1 1 0 1 0 0]
# [ 1 0 2 0 0 1]
# [99 0 0 2 0 0]]
# calculate sparsity
sparsity = 1.0 - ( count_nonzero(A) / float(A.size) )
print(sparsity)
Results:
0.555555555556
np.isnan(a).sum()
gives the number of nan
values, in this example 8.
np.prod(a.shape)
is the number of values, here 50. Their ratio should give the desired value.
In [1081]: np.isnan(a).sum()/np.prod(a.shape)
Out[1081]: 0.16
You might also find it useful to make a masked array from this
In [1085]: a_ma=np.ma.masked_invalid(a)
In [1086]: print(a_ma)
[[0.0 0.0 0.0 0.0 1.0]
[1.0 1.0 0.0 -- --]
[0.0 -- 1.0 -- --]
[1.0 1.0 1.0 1.0 0.0]
[0.0 0.0 0.0 1.0 0.0]
[0.0 0.0 0.0 0.0 --]
[-- -- 1.0 1.0 1.0]
[0.0 1.0 0.0 1.0 0.0]
[1.0 0.0 1.0 0.0 0.0]
[0.0 1.0 0.0 0.0 0.0]]
The number of valid values then is:
In [1089]: a_ma.compressed().shape
Out[1089]: (42,)
Measuring the percentage of missing values has already explained by 'hpaulj'.
I am taking the first part of your question, Assuming array has Zero's and Non-Zero's...
Sparsity refers to Zero values and density refers to Non-Zero values in array. Suppose your array is X, get count of non-zero values:
non_zero = np.count_nonzero(X)
total values in X:
total_val = np.product(X.shape)
Sparsity will be -
sparsity = (total_val - non_zero) / total_val
And Density will be -
density = non_zero / total_val
The sum of Sparsity and Density must equal to 100%...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With