Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Numpy nonzero

So I've got this numpy array of shape (31641600,2), which has some, if not many zero values in it.

Let's call the array X.

Doing:

print len(X)
>>> 31641600

But then doing:

X = X[np.nonzero(X)]
print len(X)
>>> 31919809

Don't understand why the second one is bigger. On the Documentation it says that applying the above method should return only the non-zero values, hence the length of X should be smaller.

Any ideas? Thank you.

like image 403
Claudiu S Avatar asked Jun 25 '14 15:06

Claudiu S


1 Answers

This may be due to the fact that len(X) only returns X's length along the first axis. When you do

X = X[np.nonzero(X)]

you get a 1D array, so if you had less than 50% of zeros in X, len(X) will increase.

Consider:

In [1]: import numpy as np

In [2]: X = np.zeros((42, 2))

In [3]: X[:, 0] = 1

In [4]: X[0, 1] = 1

In [5]: len(X)
Out[5]: 42

In [6]: len(X[np.nonzero(X)])
Out[6]: 43

That's because X[np.nonzero(X)] is an array of 43 one's:

In [7]: X[np.nonzero(X)].shape
Out[7]: (43,)

Update in response to comment: if in fact you want all pairs where the first element is non-zero, you can do:

X = X[ X[:, 0] != 0 ]
like image 155
Lev Levitsky Avatar answered Oct 20 '22 03:10

Lev Levitsky