I have a set of n vectors stored in the 3 x n matrix z
. I find the outer product using np.einsum
. When I timed it using:
%timeit v=np.einsum('i...,j...->ij...',z,z)
I got the result:
The slowest run took 7.23 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 2.9 µs per loop
What is happening here and can it be avoided? The best 3 is 2.9us, but the slowest maybe more typical.
The message "intermediate result is being cached" is just a blind guess in the canned message reported by %timeit. It may or may not be true, and you should not assume it is correct.
In particular, one of the most common reasons for the first run being slowest is that the array is in the CPU cache only after the first run.
CPUs cache things automatically; you cannot avoid this, and you don't really want to avoid it. However, optimizing algorithms so that CPU caches can work optimally is nowadays one of the bottlenecks that high-performance computing needs to take into account.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With