Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove multiple values from an array at once

Can someone provide me with a better (simpler, more readable, more Pythonic, more efficient, etc.) way to remove multiple values from an array than what follows:

import numpy as np

# The array.
x = np.linspace(0, 360, 37)

# The values to be removed.
a = 0
b = 180
c = 360

new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a,
                                                              x == b),
                                                x == c)))

A good answer to this question would produce the same result as the above code (i.e., new_array), but might do a better job dealing with equality between floats than the above code does.

BONUS

Can someone explain to me why this produces the wrong result?

In [5]: np.delete(x, x == a)
/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer
  "of casting it to integer", FutureWarning)
Out[5]: 
array([  20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,  100.,
        110.,  120.,  130.,  140.,  150.,  160.,  170.,  180.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.,  360.])

The values 0 and 10 have both been removed, rather than just 0 (a).

Note, x == a is as expected (so the problem is inside np.delete):

In [6]: x == a
Out[6]: 
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False, False], dtype=bool)

Note as well that np.delete(x, np.where(x == a)) produces the correct result. Thus, it appears to me that np.delete cannot handle Boolean indices.

like image 287
dbliss Avatar asked Jun 06 '15 05:06

dbliss


People also ask

How do I remove all occurrences from an array?

Using Array. The splice() method in JavaScript is often used to in-place add or remove elements from an array. The idea is to find indexes of all the elements to be removed from an array and then remove each element from the array using the splice() method.


2 Answers

You can also use np.ravel to get index of values and then remove them using np.delete

In [32]: r =  [a,b,c]

In [33]: indx = np.ravel([np.where(x == i) for i in r])

In [34]: indx
Out[34]: array([ 0, 18, 36])

In [35]: np.delete(x, indx)
Out[35]: 
array([  10.,   20.,   30.,   40.,   50.,   60.,   70.,   80.,   90.,
        100.,  110.,  120.,  130.,  140.,  150.,  160.,  170.,  190.,
        200.,  210.,  220.,  230.,  240.,  250.,  260.,  270.,  280.,
        290.,  300.,  310.,  320.,  330.,  340.,  350.])
like image 54
styvane Avatar answered Oct 13 '22 19:10

styvane


Your code does seem a little complex. I wondered whether you had considered numpy's Boolean vector indexing.

After the same setup as you I timed your code:

In [175]: %%timeit
   .....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
   .....:
10000 loops, best of 3: 32.9 µs per loop

I then timed two separate applications of Boolean indexing.

In [176]: %%timeit
   .....: x1 = x[x != a]
   .....: x2 = x1[x1 != b]
   .....: new_array = x2[x2 != c]
   .....:
100000 loops, best of 3: 6.56 µs per loop

Finally, for programming convenience and to extend the technique to an arbitrary number of excluded values I rewrote the same code as a loop. This will be a little slower, because of the need to make a copy first, but it's still quite respectable.

In [177]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[new_array != val]
   .....:
100000 loops, best of 3: 7.61 µs per loop

I think the real gain is in programming clarity, though. Finally I thought it best to verify that the three algorithms were giving the same results ...

In [179]: new_array1 = np.delete(x,
   .....:                 np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))

In [180]: x1 = x[x != a]

In [181]: x2 = x1[x1 != b]

In [182]: new_array2 = x2[x2 != c]

In [183]: new_array3 = x.copy()

In [184]: for val in (a, b, c):
   .....:         new_array3 = new_array3[new_array3 != val]
   .....:

In [185]: all(new_array1 == new_array2)
Out[185]: True

In [186]: all(new_array1 == new_array3)
Out[186]: True

To handle the issue of floating-point comparisons you need to use numpy's isclose() function. As expected, this sends the timing to hell:

In [188]: %%timeit
   .....: new_array = x.copy()
   .....: for val in (a, b, c):
   .....:     new_array = new_array[~np.isclose(new_array, val)]
   .....:
10000 loops, best of 3: 126 µs per loop

The answer to your bonus is contained within the warning, but the warning isn't very useful unless you know that False and True compare numerically equal to zero and one respectively. So your code is equivalent to

np.delete(1, 1)

As the warning makes clear, the numpy team eventually intend that the result using Boolean arguments to np.delete() is likely to change, but at present it only takes index arguments.

like image 6
holdenweb Avatar answered Oct 13 '22 17:10

holdenweb