So I've got this numpy array of shape (31641600,2), which has some, if not many zero values in it.
Let's call the array X.
Doing:
print len(X)
>>> 31641600
But then doing:
X = X[np.nonzero(X)]
print len(X)
>>> 31919809
Don't understand why the second one is bigger. On the Documentation it says that applying the above method should return only the non-zero values, hence the length of X should be smaller.
Any ideas? Thank you.
This may be due to the fact that len(X)
only returns X
's length along the first axis. When you do
X = X[np.nonzero(X)]
you get a 1D array, so if you had less than 50% of zeros in X
, len(X)
will increase.
Consider:
In [1]: import numpy as np
In [2]: X = np.zeros((42, 2))
In [3]: X[:, 0] = 1
In [4]: X[0, 1] = 1
In [5]: len(X)
Out[5]: 42
In [6]: len(X[np.nonzero(X)])
Out[6]: 43
That's because X[np.nonzero(X)]
is an array of 43 one's:
In [7]: X[np.nonzero(X)].shape
Out[7]: (43,)
Update in response to comment: if in fact you want all pairs where the first element is non-zero, you can do:
X = X[ X[:, 0] != 0 ]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With