Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting certain elements from numpy array using conditional checks

I want to remove some entries from a numpy array that is about a million entries long.

This code would do it but take a long time:

a = np.array([1,45,23,23,1234,3432,-1232,-34,233])
for element in a:
    if element<(-100) or element>100:
         some delete command.

Can I do this any other way?

like image 617
Abhinav Kumar Avatar asked Jan 04 '14 06:01

Abhinav Kumar


People also ask

How do I remove a specific element from an array?

pop() function: This method is used to remove elements from the end of an array. shift() function: This method is used to remove elements from the start of an array. splice() function: This method is used to remove elements from the specific index of an array.

How do you remove certain values from an array in Python?

You can use the pop() method to remove an element from the array.

How do you drop rows that contain a missing value from a NumPy array?

To remove rows containing missing values, use any() method that returns True if there is at least one True in ndarray . With the argument axis=1 , any() tests whether there is at least one True for each row. Use the negation operator ~ to make rows with no missing values True .


2 Answers

I'm assuming you mean a < -100 or a > -100, the most concise way is to use logical indexing.

a = a[(a >= -100) & (a <= 100)]

This is not exactly "deleting" the entries, rather making a copy of the array minus the unwanted values and assigning it to the variable that was previously assigned to the old array. After this happens the old array has no remaining references and is garbage collected, meaning its memory is freed.

It's worth noting that this method does not use constant memory, since we make a copy of the array it uses memory linear in the size of the array. This could be bad if your array is so huge it reaches the limits of the memory on your machine. The process of actually going through and removing each element in the array "in place", aka using constant memory, would be a very different operation, as elements in the array would need to be swapped around and the block of memory resized. I'm not sure you can do this with a numpy array, however one thing you can do to avoid copying is to use a numpy masked array:

import numpy.ma as ma
mx = ma.masked_array(a, mask = ((a < -100) | (a > 100)) )

All operations on the masked array will act as if the elements we "deleted" don't exist, but we didn't really "delete" them, they are still there in memory, there is just a record of which elements to skip now associated with the array, and we don't ever need to make a copy of the array in memory. Also if we ever want our deleted values back, we can just remove the mask like so:

mx.mask = ma.nomask
like image 136
qwwqwwq Avatar answered Oct 05 '22 06:10

qwwqwwq


You can use masked index with inversed condition.

>>> a = np.array([1,45,23,23,1234,3432,-1232,-34,233])

>>> a[~((a < -100) | (a > 100))]
array([  1,  45,  23,  23, -34])

>>> a[(a >= -100) & (a <= 100)]
array([  1,  45,  23,  23, -34])

>>> a[abs(a) <= 100]
array([  1,  45,  23,  23, -34])
like image 33
falsetru Avatar answered Oct 05 '22 06:10

falsetru