How does numpy broadcasting perform faster?

1 Answers

I would try to answer the second part of the question.

So, with it we are comparing :

A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]  (I)

and

A[~((A[:,None,:] == B).all(-1)).any(1)]

To compare with a matching perspective against the first one, we could write down the second approach like this -

A[(((~(A[:,None,:] == B)).any(2))).all(1)]         (II)

The major difference when considering performance, would be the fact that with the first one, we are getting non-matches with subtraction and then checking for non-zeros with .any(). Thus, any() is made to operate on an array of non-boolean dtype array. In the second approach, instead we are feeding it a boolean array obtained with A[:,None,:] == B.

Let's do a small runtime test to see how .any() performs on int dtype vs boolean array -

In [141]: A = np.random.randint(0,9,(1000,1000)) # An int array

In [142]: %timeit A.any(0)
1000 loops, best of 3: 1.43 ms per loop

In [143]: A = np.random.randint(0,9,(1000,1000))>5 # A boolean array

In [144]: %timeit A.any(0)
10000 loops, best of 3: 164 µs per loop

So, with close to 9x speedup on this part, we see a huge advantage to use any() with boolean arrays. This I think was the biggest reason to make the second approach faster.

102

answered Oct 25 '22 15:10

Divakar

Related questions
                            
                                Dynamic radio buttons from database query using Flask and WTForms
                            
                                Alternatives to numpy einsum
                            
                                Does Dask support functions with multiple outputs in Custom Graphs?
                            
                                Running a cron job in Elastic Beanstalk
                            
                                change urlparse.path of a url
                            
                                How do I add multiple markers to a stripplot in seaborn?
                            
                                How to execute code asynchronously in Twisted Klein?
                            
                                How to get dict of lists from relationship in sqlalchemy?
                            
                                UnicodeDecodeError Loading with sqlalchemy
                            
                                Storing and using a trained neural network
                            
                                Modify pandas group
                            
                                Pip install-couldn't find a version that satisfies the requirement
                            
                                Collocations with spaCy
                            
                                Fast hash for 2 coordinates where order doesn't matter?
                            
                                Can you dynamically add class attributes/variables to a subclass in python?
                            
                                python: nested classes: access outer class class member
                            
                                ProgrammingError: (psycopg2.ProgrammingError) can't adapt type 'numpy.ndarray'
                            
                                Python - tf-idf predict a new document similarity
                            
                                Firebase "message": "PASSWORD_LOGIN_DISABLED" response when trying to authenticate
                            
                                2D Orthogonal projection of vector onto line with numpy yields wrong result

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does numpy broadcasting perform faster?

Tags:

python

numpy

R. S. Nikhil Krishna

People also ask

1 Answers

Divakar

Recent Activity

Donate For Us