Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering out columns with zero values in numpy

Tags:

python

numpy

Given a numpy array A such as:

[[    0.   482.  1900.   961.   579.    56.]
 [    0.   530.  1906.   914.   584.    44.]
 [   43.     0.  1932.   948.   556.    51.]
 [    0.   482.  1917.   946.   581.    52.]
 [    0.   520.  1935.   878.   589.    55.]]

I'd like to get a new array that excludes all columns where a 0 appears, that is:

[[  1900.   961.   579.    56.]
 [  1906.   914.   584.    44.]
 [  1932.   948.   556.    51.]
 [  1917.   946.   581.    52.]
 [  1935.   878.   589.    55.]]

What I've tried is the following:

non_zero = np.array([np.all(totals>0,axis=0)]*N_ROWS);    

Which gives me:

[[False  False  True  True  True  True]
 [False  False  True  True  True  True]
 [False  False  True  True  True  True]
 [False  False  True  True  True  True]
 [False  False  True  True  True  True]]

Trouble is, doing then A[non_zero] returns the expected values but rearranged into a one dimensional vector.

So, do you guys know what I'm doing wrong, or if I'm making life over complicated? Thanks!

UPDATE: thanks for all the answers! One short thing in addition to the accepted answer: clearly, aside from the selection itself, I should have used the : operand as in:

non_zero = np.array(np.all(totals!=0,axis=0));    
a[:,non_zero]

And then of course, there's more compact ways (see accepted answer)

like image 896
Miquel Avatar asked Dec 11 '13 14:12

Miquel


2 Answers

Assuming here that your values are non-negative:

A = np.array([[ 0, 482, 1900, 961, 579, 56.],
              [ 0, 530, 1906, 914, 584, 44.],
              [ 43, 0, 1932, 948, 556, 51.],
              [ 0, 482, 1917, 946, 581, 52.],
              [ 0, 520, 1935, 878, 589, 55.]])

A[:, np.all(A > 0, axis=0)]

Which gives me

array([[ 1900.,   961.,   579.,    56.],
       [ 1906.,   914.,   584.,    44.],
       [ 1932.,   948.,   556.,    51.],
       [ 1917.,   946.,   581.,    52.],
       [ 1935.,   878.,   589.,    55.]])

If you ignore potential precision issues you can replace the condition with np.any(A == 0, axis=0)

like image 71
YXD Avatar answered Oct 13 '22 00:10

YXD


How about something like this? You can find the indices of columns where there are exactly zero 0s by summing the number of zeros in each column, and finding which sums are zero:

>>> B = A[:, np.sum(A == 0, axis=0) == 0 ]
>>> B
array([[ 1900.,   961.,   579.,    56.],
       [ 1906.,   914.,   584.,    44.],
       [ 1932.,   948.,   556.,    51.],
       [ 1917.,   946.,   581.,    52.],
       [ 1935.,   878.,   589.,    55.]])
like image 25
mdml Avatar answered Oct 13 '22 00:10

mdml