I want to remove some entries from a numpy array that is about a million entries long. This code would do it but take a long time: <pre class="prettyprint"><code>a = np.array([1,45,23,23,1234,3432,-1232,-34,233]) for element in a: if element<(-100) or element>100: some delete command. </code></pre> Can I do this any other way?

You can use masked index with inversed condition. <pre class="prettyprint"><code>>>> a = np.array([1,45,23,23,1234,3432,-1232,-34,233]) >>> a[~((a < -100) | (a > 100))] array([ 1, 45, 23, 23, -34]) >>> a[(a >= -100) & (a <= 100)] array([ 1, 45, 23, 23, -34]) >>> a[abs(a) <= 100] array([ 1, 45, 23, 23, -34]) </code></pre>

Deleting certain elements from numpy array using conditional checks

Tags:

python

arrays

numpy

I want to remove some entries from a numpy array that is about a million entries long.

This code would do it but take a long time:

a = np.array([1,45,23,23,1234,3432,-1232,-34,233])
for element in a:
    if element<(-100) or element>100:
         some delete command.

Can I do this any other way?

617

asked Jan 04 '14 06:01

Abhinav Kumar

2 Answers

I'm assuming you mean a < -100 or a > -100, the most concise way is to use logical indexing.

a = a[(a >= -100) & (a <= 100)]

This is not exactly "deleting" the entries, rather making a copy of the array minus the unwanted values and assigning it to the variable that was previously assigned to the old array. After this happens the old array has no remaining references and is garbage collected, meaning its memory is freed.

It's worth noting that this method does not use constant memory, since we make a copy of the array it uses memory linear in the size of the array. This could be bad if your array is so huge it reaches the limits of the memory on your machine. The process of actually going through and removing each element in the array "in place", aka using constant memory, would be a very different operation, as elements in the array would need to be swapped around and the block of memory resized. I'm not sure you can do this with a numpy array, however one thing you can do to avoid copying is to use a numpy masked array:

import numpy.ma as ma
mx = ma.masked_array(a, mask = ((a < -100) | (a > 100)) )

All operations on the masked array will act as if the elements we "deleted" don't exist, but we didn't really "delete" them, they are still there in memory, there is just a record of which elements to skip now associated with the array, and we don't ever need to make a copy of the array in memory. Also if we ever want our deleted values back, we can just remove the mask like so:

mx.mask = ma.nomask

136

answered Oct 05 '22 06:10

qwwqwwq

You can use masked index with inversed condition.

>>> a = np.array([1,45,23,23,1234,3432,-1232,-34,233])

>>> a[~((a < -100) | (a > 100))]
array([  1,  45,  23,  23, -34])

>>> a[(a >= -100) & (a <= 100)]
array([  1,  45,  23,  23, -34])

>>> a[abs(a) <= 100]
array([  1,  45,  23,  23, -34])

answered Oct 05 '22 06:10

falsetru

Related questions
                            
                                In python, how to check the end of standard input streams (sys.stdin) and do something special on that
                            
                                C++ equivalent of Python dictionaries
                            
                                Add pygame module in PyCharm IDE
                            
                                map degrees to 0 - 360 in python
                            
                                PyQt: No error msg (traceback) on exit
                            
                                Visual Studio, Python not auto-indenting
                            
                                Pandas delete all rows that are not a 'datetime' type
                            
                                pandas.to_numeric - find out which string it was unable to parse
                            
                                How can I correct the error ' AttributeError: 'dict_keys' object has no attribute 'remove' '?
                            
                                How to make worker threads quit after work is finished in a multithreaded producer-consumer pattern?
                            
                                Interweave two dataframes
                            
                                How does numpy's argpartition work on the documentation's example?
                            
                                Change pandas plotting backend to get interactive plots instead of matplotlib static plots
                            
                                ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty
                            
                                Django with PyPy
                            
                                PySerial: How to send Ctrl-C command on the serial line
                            
                                Range is too large Python
                            
                                How to compare the modified date of two files in python?
                            
                                Python 2.7: %d, %s, and float()
                            
                                Flatten a list of strings and lists of strings and lists in Python [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With