I have numpy compiled with OpenBlas and I am wondering why einsum is much slower than dot (I understand in the 3 indices case, but I dont understand why it is also less performant in the two indices case)? Here an example:
import numpy as np
A = np.random.random([1000,1000])
B = np.random.random([1000,1000])
%timeit np.dot(A,B)
Out: 10 loops, best of 3: 26.3 ms per loop
%timeit np.einsum("ij,jk",A,B)
Out: 5 loops, best of 3: 477 ms per loop
Is there a way to let einsum use OpenBlas and parallelization like numpy.dot? Why does np.einsum not just call np.dot if it notices a dot product?
einsum
parses the index string, and then constructs an nditer
object, and uses that to perform a sum-of-products iteration. It has special cases where the indexes just perform axis swaps, and sums ('ii->i'). It may also have special cases for 2 and 3 variables (as opposed to more). But it does not make any attempt to invoke external libraries.
I worked out a pure python work-a-like, but with more focus on the parsing than the calculation special cases.
tensordot
reshapes and swaps, so it can then call dot
to the actual calculations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With