Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove specific elements in a numpy array

People also ask

How do you delete specific elements from an array in Python?

For removing an element from an array, developers can also use the pop() method of the array. If we simply use pop() without passing any parameter, then it will remove the element from the last (n th) index. But if we specify the index value of the array, it will remove that particular element from the array.

How do I remove a specific element from an array?

pop() function: This method is use to remove elements from the end of an array. shift() function: This method is use to remove elements from the start of an array. splice() function: This method is use to remove elements from the specific index of an array.

How do you delete specific elements?

To remove a specific element from an array in JavaScript: Find the index of the element using the indexOf() function. Remove the element with the index using splice() function.


Use numpy.delete() - returns a new array with sub-arrays along an axis deleted

numpy.delete(a, index)

For your specific question:

import numpy as np

a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
index = [2, 3, 6]

new_a = np.delete(a, index)

print(new_a) #Prints `[1, 2, 5, 6, 8, 9]`

Note that numpy.delete() returns a new array since array scalars are immutable, similar to strings in Python, so each time a change is made to it, a new object is created. I.e., to quote the delete() docs:

"A copy of arr with the elements specified by obj removed. Note that delete does not occur in-place..."

If the code I post has output, it is the result of running the code.


There is a numpy built-in function to help with that.

import numpy as np
>>> a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = np.array([3,4,7])
>>> c = np.setdiff1d(a,b)
>>> c
array([1, 2, 5, 6, 8, 9])

A Numpy array is immutable, meaning you technically cannot delete an item from it. However, you can construct a new array without the values you don't want, like this:

b = np.delete(a, [2,3,6])

To delete by value :

modified_array = np.delete(original_array, np.where(original_array == value_to_delete))

Using np.delete is the fastest way to do it, if we know the indices of the elements that we want to remove. However, for completeness, let me add another way of "removing" array elements using a boolean mask created with the help of np.isin. This method allows us to remove the elements by specifying them directly or by their indices:

import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])

Remove by indices:

indices_to_remove = [2, 3, 6]
a = a[~np.isin(np.arange(a.size), indices_to_remove)]

Remove by elements (don't forget to recreate the original a since it was rewritten in the previous line):

elements_to_remove = a[indices_to_remove]  # [3, 4, 7]
a = a[~np.isin(a, elements_to_remove)]

Not being a numpy person, I took a shot with:

>>> import numpy as np
>>> import itertools
>>> 
>>> a = np.array([1,2,3,4,5,6,7,8,9])
>>> index=[2,3,6]
>>> a = np.array(list(itertools.compress(a, [i not in index for i in range(len(a))])))
>>> a
array([1, 2, 5, 6, 8, 9])

According to my tests, this outperforms numpy.delete(). I don't know why that would be the case, maybe due to the small size of the initial array?

python -m timeit -s "import numpy as np" -s "import itertools" -s "a = np.array([1,2,3,4,5,6,7,8,9])" -s "index=[2,3,6]" "a = np.array(list(itertools.compress(a, [i not in index for i in range(len(a))])))"
100000 loops, best of 3: 12.9 usec per loop

python -m timeit -s "import numpy as np" -s "a = np.array([1,2,3,4,5,6,7,8,9])" -s "index=[2,3,6]" "np.delete(a, index)"
10000 loops, best of 3: 108 usec per loop

That's a pretty significant difference (in the opposite direction to what I was expecting), anyone have any idea why this would be the case?

Even more weirdly, passing numpy.delete() a list performs worse than looping through the list and giving it single indices.

python -m timeit -s "import numpy as np" -s "a = np.array([1,2,3,4,5,6,7,8,9])" -s "index=[2,3,6]" "for i in index:" "    np.delete(a, i)"
10000 loops, best of 3: 33.8 usec per loop

Edit: It does appear to be to do with the size of the array. With large arrays, numpy.delete() is significantly faster.

python -m timeit -s "import numpy as np" -s "import itertools" -s "a = np.array(list(range(10000)))" -s "index=[i for i in range(10000) if i % 2 == 0]" "a = np.array(list(itertools.compress(a, [i not in index for i in range(len(a))])))"
10 loops, best of 3: 200 msec per loop

python -m timeit -s "import numpy as np" -s "a = np.array(list(range(10000)))" -s "index=[i for i in range(10000) if i % 2 == 0]" "np.delete(a, index)"
1000 loops, best of 3: 1.68 msec per loop

Obviously, this is all pretty irrelevant, as you should always go for clarity and avoid reinventing the wheel, but I found it a little interesting, so I thought I'd leave it here.


In case you don't have the indices of the elements you want to remove, you can use the function in1d provided by numpy.

The function returns True if the element of a 1-D array is also present in a second array. To delete the elements, you just have to negate the values returned by this function.

Notice that this method keeps the order from the original array.

In [1]: import numpy as np

        a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
        rm = np.array([3, 4, 7])
        # np.in1d return true if the element of `a` is in `rm`
        idx = np.in1d(a, rm)
        idx

Out[1]: array([False, False,  True,  True, False, False,  True, False, False])

In [2]: # Since we want the opposite of what `in1d` gives us, 
        # you just have to negate the returned value
        a[~idx]

Out[2]: array([1, 2, 5, 6, 8, 9])