Why is numpy.power 60x slower than in-lining?

Tags:

Maybe I'm doing something odd, but maybe found a surprising performance loss when using numpy, seems consistent regardless of the power used. For instance when x is a random 100x100 array

x = numpy.power(x,3)

is about 60x slower than

x = x*x*x

A plot of the speed up for various array sizes reveals a sweet spot with arrays around size 10k and a consistent 5-10x speed up for other sizes.

enter image description here

Code to test below on your own machine (a little messy):

import numpy as np from matplotlib import pyplot as plt from time import time  ratios = [] sizes = [] for n in np.logspace(1,3,20).astype(int):     a = np.random.randn(n,n)      inline_times = []     for i in range(100):         t = time()         b = a*a*a         inline_times.append(time()-t)     inline_time = np.mean(inline_times)      pow_times = []     for i in range(100):         t = time()         b = np.power(a,3)         pow_times.append(time()-t)     pow_time = np.mean(pow_times)      sizes.append(a.size)     ratios.append(pow_time/inline_time)  plt.plot(sizes,ratios) plt.title('Performance of inline vs numpy.power') plt.ylabel('Nx speed-up using inline') plt.xlabel('Array size') plt.xscale('log') plt.show()

Anyone have an explanation?

857

asked Aug 12 '14 00:08

2 Answers

It's well known that multiplication of doubles, which your processor can do in a very fancy way, is very, very fast. pow is decidedly slower.

Some performance guides out there even advise people to plan for this, perhaps even in some way that might be a bit overzealous at times.

numpy special-cases squaring to make sure it's not too, too slow, but it sends cubing right off to your libc's pow, which isn't nearly as fast as a couple multiplications.

answered Oct 08 '22 23:10

I suspect the issue is that np.power always does float exponentiation, and it doesn't know how to optimize or vectorize that on your platform (or, probably, most/all platforms), while multiplication is easy to toss into SSE, and pretty fast even if you don't.

Even if np.power were smart enough to do integer exponentiation separately, unless it unrolled small values into repeated multiplication, it still wouldn't be nearly as fast.

You can verify this pretty easily by comparing the time for int-to-int, int-to-float, float-to-int, and float-to-float powers vs. multiplication for a small array; int-to-int is about 5x as fast as the others—but still 4x slower than multiplication (although I tested with PyPy with a customized NumPy, so it's probably better for someone with the normal NumPy installed on CPython to give real results…)

answered Oct 08 '22 23:10

abarnert

Related questions
                            
                                asyncio: Is it possible to cancel a future been run by an Executor?
                            
                                Converting from Pandas dataframe to TensorFlow tensor object
                            
                                Get the format in dateutil.parse
                            
                                How to mock requests using pytest? [duplicate]
                            
                                Accessing a dict by variable in Django templates?
                            
                                Add an object to a python list
                            
                                Driving a Windows GUI program from a script
                            
                                EOFError: EOF when reading a line
                            
                                Django REST Framework: raise error when extra fields are present on POST
                            
                                Does spark predicate pushdown work with JDBC?
                            
                                Load pickled object in different file - Attribute error
                            
                                How to run python interactive in current file's directory in Visual Studio Code?
                            
                                Parallelism in Python
                            
                                Converting a Mercurial (hg) repository to Git on Windows (7)
                            
                                When is a python object's hash computed and why is the hash of -1 different?
                            
                                Descriptors as instance attributes in python
                            
                                Python, WSGI, multiprocessing and shared data
                            
                                Handle exception in __init__
                            
                                PyCharm noinspection for whole file?
                            
                                Python official installer missing python27.dll

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is numpy.power 60x slower than in-lining?

Tags:

performance

python

arrays

numpy

Newmu

People also ask

2 Answers

Mike Graham

abarnert

Recent Activity

Donate For Us