I'm performing a decently complex operation on some 3- and 4-dimensional tensor using numpy einsum.
My actual code is
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)
This does what I want it to do.
Using einsum_path, the result is:
>>> path = np.einsum_path('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)
>>> print(path[0])
['einsum_path', (0, 1), (0, 3), (0, 1), (0, 1)]
>>> print(path[1])
Complete contraction: oij,imj,mjkn,lnk,plk->op
Naive scaling: 8
Optimized scaling: 5
Naive FLOP count: 2.668e+07
Optimized FLOP count: 1.340e+05
Theoretical speedup: 199.136
Largest intermediate: 7.700e+02 elements
--------------------------------------------------------------------------
scaling current remaining
--------------------------------------------------------------------------
4 imj,oij->moj mjkn,lnk,plk,moj->op
5 moj,mjkn->nok lnk,plk,nok->op
4 plk,lnk->npk nok,npk->op
4 npk,nok->op op->op
This indicates a theoretical speedup of about 200x.
How can I use this result to speed up my code? How do I "implement" what einsum_path is telling me?
einsum. Evaluates the Einstein summation convention on the operands. Using the Einstein summation convention, many common multi-dimensional, linear algebraic array operations can be represented in a simple fashion.
einsum is clearly faster. Actually, twice as fast as numpy's built-in functions and, well, 6 times faster than loops, in this case.
einsum is ~20X slower than manually multiplying and summing #32591.
Do some time tests
path = np.einsum_path('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=False)
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=True)
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=path[0])
In my testing the second 2 run at the same speed. For a small problem optimize=False
is faster, presumably because the analysis and rearranging takes time. For a large problem, with a larger theoretical speedup, the actual speedup for True
can be larger than than theory. Presumably memory management is slowing down the False
case.
The theoretical speedup
is just that, an estimate based just on FLOPS count. That will be true only to the extent that FLOPS dominate the calculation.
You could also time the path
calc. The size of the problem will determine whether its time is a small or large part of the total time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With