I am looking to apply a function to each row of a numpy array. If this function evaluates to true I will keep the row, otherwise I will discard it. For example, my function might be: <pre class="prettyprint"><code>def f(row): if sum(row)>10: return True else: return False </code></pre> I was wondering if there was something similar to: <pre class="prettyprint"><code>np.apply_over_axes() </code></pre> which applies a function to each row of a numpy array and returns the result. I was hoping for something like: <pre class="prettyprint"><code>np.filter_over_axes() </code></pre> which would apply a function to each row of an numpy array and only return rows for which the function returned true. Is there anything like this? Or should I just use a for loop?

Ideally, you would be able to implement a vectorized version of your function and use that to do boolean indexing. For the vast majority of problems this is the right solution. Numpy provides quite a few functions that can act over various axes as well as all the basic operations and comparisons, so most useful conditions should be vectorizable. <pre class="prettyprint"><code>import numpy as np x = np.random.randn(20, 3) x_new = x[np.sum(x, axis=1) > .5] </code></pre> If you are absolutely sure that you can't do the above, I would suggest using a list comprehension (or <code>np.apply_along_axis</code>) to create an array of bools to index with. <pre class="prettyprint"><code>def myfunc(row): return sum(row) > .5 bool_arr = np.array([myfunc(row) for row in x]) x_new = x[bool_arr] </code></pre> This will get the job done in a relatively clean way, but will be significantly slower than a vectorized version. An example: <pre class="prettyprint"><code>x = np.random.randn(5000, 200) %timeit x[np.sum(x, axis=1) > .5] # 100 loops, best of 3: 5.71 ms per loop %timeit x[np.array([myfunc(row) for row in x])] # 1 loops, best of 3: 217 ms per loop </code></pre>

Filter rows of a numpy array?

Tags:

python

filter

numpy

I am looking to apply a function to each row of a numpy array. If this function evaluates to true I will keep the row, otherwise I will discard it. For example, my function might be:

def f(row):     if sum(row)>10: return True     else: return False

I was wondering if there was something similar to:

np.apply_over_axes()

which applies a function to each row of a numpy array and returns the result. I was hoping for something like:

np.filter_over_axes()

which would apply a function to each row of an numpy array and only return rows for which the function returned true. Is there anything like this? Or should I just use a for loop?

867

asked Oct 02 '14 05:10

killajoule

1 Answers

Ideally, you would be able to implement a vectorized version of your function and use that to do boolean indexing. For the vast majority of problems this is the right solution. Numpy provides quite a few functions that can act over various axes as well as all the basic operations and comparisons, so most useful conditions should be vectorizable.

import numpy as np  x = np.random.randn(20, 3) x_new = x[np.sum(x, axis=1) > .5]

If you are absolutely sure that you can't do the above, I would suggest using a list comprehension (or np.apply_along_axis) to create an array of bools to index with.

def myfunc(row):     return sum(row) > .5  bool_arr = np.array([myfunc(row) for row in x]) x_new = x[bool_arr]

This will get the job done in a relatively clean way, but will be significantly slower than a vectorized version. An example:

x = np.random.randn(5000, 200)  %timeit x[np.sum(x, axis=1) > .5] # 100 loops, best of 3: 5.71 ms per loop  %timeit x[np.array([myfunc(row) for row in x])] # 1 loops, best of 3: 217 ms per loop

answered Oct 05 '22 13:10

Roger Fan

Related questions
                            
                                Python: Multiple packages in one repository or one package per repository?
                            
                                Python embeddable zip
                            
                                Getting Spark, Python, and MongoDB to work together
                            
                                Installer and Updater for a python desktop application
                            
                                Unsupervised pre-training for convolutional neural network in theano
                            
                                Simple, hassle-free, zero-boilerplate serialization in Scala/Java similar to Python's Pickle?
                            
                                Merge on single level of MultiIndex
                            
                                Why do I get an int when I index bytes?
                            
                                Grouping Functions by Using Classes in Python
                            
                                How to detect if the console does support ANSI escape codes in Python?
                            
                                How do I develop against OAuth locally?
                            
                                Pretty print JSON dumps
                            
                                Python 3.4: How to import a module given the full path? [duplicate]
                            
                                Avoid `logger=logging.getLogger(__name__)`
                            
                                Python+Celery: Chaining jobs?
                            
                                Packaging and shipping a python library and scripts, the professional way
                            
                                Error: No module named 'fcntl'
                            
                                Python - Best/Cleanest way to define constant lists or dictionarys
                            
                                Maximum size of pandas dataframe
                            
                                Test Flask render_template() context

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With