Can someone provide me with a better (simpler, more readable, more Pythonic, more efficient, etc.) way to remove multiple values from an array than what follows:
import numpy as np
# The array.
x = np.linspace(0, 360, 37)
# The values to be removed.
a = 0
b = 180
c = 360
new_array = np.delete(x, np.where(np.logical_or(np.logical_or(x == a,
x == b),
x == c)))
A good answer to this question would produce the same result as the above code (i.e., new_array
), but might do a better job dealing with equality between floats than the above code does.
Can someone explain to me why this produces the wrong result?
In [5]: np.delete(x, x == a)
/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py:3254: FutureWarning: in the future insert will treat boolean arrays and array-likes as boolean index instead of casting it to integer
"of casting it to integer", FutureWarning)
Out[5]:
array([ 20., 30., 40., 50., 60., 70., 80., 90., 100.,
110., 120., 130., 140., 150., 160., 170., 180., 190.,
200., 210., 220., 230., 240., 250., 260., 270., 280.,
290., 300., 310., 320., 330., 340., 350., 360.])
The values 0 and 10 have both been removed, rather than just 0 (a
).
Note, x == a
is as expected (so the problem is inside np.delete
):
In [6]: x == a
Out[6]:
array([ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False], dtype=bool)
Note as well that np.delete(x, np.where(x == a))
produces the correct result. Thus, it appears to me that np.delete
cannot handle Boolean indices.
Using Array. The splice() method in JavaScript is often used to in-place add or remove elements from an array. The idea is to find indexes of all the elements to be removed from an array and then remove each element from the array using the splice() method.
You can also use np.ravel
to get index of values
and then remove them using np.delete
In [32]: r = [a,b,c]
In [33]: indx = np.ravel([np.where(x == i) for i in r])
In [34]: indx
Out[34]: array([ 0, 18, 36])
In [35]: np.delete(x, indx)
Out[35]:
array([ 10., 20., 30., 40., 50., 60., 70., 80., 90.,
100., 110., 120., 130., 140., 150., 160., 170., 190.,
200., 210., 220., 230., 240., 250., 260., 270., 280.,
290., 300., 310., 320., 330., 340., 350.])
Your code does seem a little complex. I wondered whether you had considered numpy's Boolean vector indexing.
After the same setup as you I timed your code:
In [175]: %%timeit
.....: np.delete(x, np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
.....:
10000 loops, best of 3: 32.9 µs per loop
I then timed two separate applications of Boolean indexing.
In [176]: %%timeit
.....: x1 = x[x != a]
.....: x2 = x1[x1 != b]
.....: new_array = x2[x2 != c]
.....:
100000 loops, best of 3: 6.56 µs per loop
Finally, for programming convenience and to extend the technique to an arbitrary number of excluded values I rewrote the same code as a loop. This will be a little slower, because of the need to make a copy first, but it's still quite respectable.
In [177]: %%timeit
.....: new_array = x.copy()
.....: for val in (a, b, c):
.....: new_array = new_array[new_array != val]
.....:
100000 loops, best of 3: 7.61 µs per loop
I think the real gain is in programming clarity, though. Finally I thought it best to verify that the three algorithms were giving the same results ...
In [179]: new_array1 = np.delete(x,
.....: np.where(np.logical_or(np.logical_or(x == a, x == b), x == c)))
In [180]: x1 = x[x != a]
In [181]: x2 = x1[x1 != b]
In [182]: new_array2 = x2[x2 != c]
In [183]: new_array3 = x.copy()
In [184]: for val in (a, b, c):
.....: new_array3 = new_array3[new_array3 != val]
.....:
In [185]: all(new_array1 == new_array2)
Out[185]: True
In [186]: all(new_array1 == new_array3)
Out[186]: True
To handle the issue of floating-point comparisons you need to use numpy's isclose()
function. As expected, this sends the timing to hell:
In [188]: %%timeit
.....: new_array = x.copy()
.....: for val in (a, b, c):
.....: new_array = new_array[~np.isclose(new_array, val)]
.....:
10000 loops, best of 3: 126 µs per loop
The answer to your bonus is contained within the warning, but the warning isn't very useful unless you know that False
and True
compare numerically equal to zero and one respectively. So your code is equivalent to
np.delete(1, 1)
As the warning makes clear, the numpy team eventually intend that the result using Boolean arguments to np.delete()
is likely to change, but at present it only takes index arguments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With