Maybe I'm doing something odd, but maybe found a surprising performance loss when using numpy, seems consistent regardless of the power used. For instance when x is a random 100x100 array
x = numpy.power(x,3)
is about 60x slower than
x = x*x*x
A plot of the speed up for various array sizes reveals a sweet spot with arrays around size 10k and a consistent 5-10x speed up for other sizes.
Code to test below on your own machine (a little messy):
import numpy as np from matplotlib import pyplot as plt from time import time ratios = [] sizes = [] for n in np.logspace(1,3,20).astype(int): a = np.random.randn(n,n) inline_times = [] for i in range(100): t = time() b = a*a*a inline_times.append(time()-t) inline_time = np.mean(inline_times) pow_times = [] for i in range(100): t = time() b = np.power(a,3) pow_times.append(time()-t) pow_time = np.mean(pow_times) sizes.append(a.size) ratios.append(pow_time/inline_time) plt.plot(sizes,ratios) plt.title('Performance of inline vs numpy.power') plt.ylabel('Nx speed-up using inline') plt.xlabel('Array size') plt.xscale('log') plt.show()
Anyone have an explanation?
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
CuPy is a library that implements Numpy arrays on Nvidia GPUs by leveraging the CUDA GPU library. With that implementation, superior parallel speedup can be achieved due to the many CUDA cores GPUs have. CuPy's interface is a mirror of Numpy and in most cases, it can be used as a direct replacement.
NumPy random for generating an array of random numbers ndarray of 1000 random numbers. The reason why NumPy is fast when used right is that its arrays are extremely efficient. They are like C arrays instead of Python lists.
The time matlab takes to complete the task is 0.252454 seconds while numpy 0.973672151566, that is almost four times more.
It's well known that multiplication of doubles, which your processor can do in a very fancy way, is very, very fast. pow
is decidedly slower.
Some performance guides out there even advise people to plan for this, perhaps even in some way that might be a bit overzealous at times.
numpy special-cases squaring to make sure it's not too, too slow, but it sends cubing right off to your libc's pow
, which isn't nearly as fast as a couple multiplications.
I suspect the issue is that np.power
always does float exponentiation, and it doesn't know how to optimize or vectorize that on your platform (or, probably, most/all platforms), while multiplication is easy to toss into SSE, and pretty fast even if you don't.
Even if np.power
were smart enough to do integer exponentiation separately, unless it unrolled small values into repeated multiplication, it still wouldn't be nearly as fast.
You can verify this pretty easily by comparing the time for int-to-int, int-to-float, float-to-int, and float-to-float powers vs. multiplication for a small array; int-to-int is about 5x as fast as the others—but still 4x slower than multiplication (although I tested with PyPy with a customized NumPy, so it's probably better for someone with the normal NumPy installed on CPython to give real results…)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With