I'm performing a decently complex operation on some 3- and 4-dimensional tensor using numpy einsum.
My actual code is
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)
This does what I want it to do.
Using einsum_path, the result is:
>>> path = np.einsum_path('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)
>>> print(path[0])
['einsum_path', (0, 1), (0, 3), (0, 1), (0, 1)]
>>> print(path[1])
  Complete contraction:  oij,imj,mjkn,lnk,plk->op
         Naive scaling:  8
     Optimized scaling:  5
      Naive FLOP count:  2.668e+07
  Optimized FLOP count:  1.340e+05
   Theoretical speedup:  199.136
  Largest intermediate:  7.700e+02 elements
--------------------------------------------------------------------------
scaling                  current                                remaining
--------------------------------------------------------------------------
   4                imj,oij->moj                     mjkn,lnk,plk,moj->op
   5               moj,mjkn->nok                          lnk,plk,nok->op
   4                plk,lnk->npk                              nok,npk->op
   4                 npk,nok->op                                   op->op
This indicates a theoretical speedup of about 200x.
How can I use this result to speed up my code? How do I "implement" what einsum_path is telling me?
einsum. Evaluates the Einstein summation convention on the operands. Using the Einstein summation convention, many common multi-dimensional, linear algebraic array operations can be represented in a simple fashion.
einsum is clearly faster. Actually, twice as fast as numpy's built-in functions and, well, 6 times faster than loops, in this case.
einsum is ~20X slower than manually multiplying and summing #32591.
Do some time tests
path = np.einsum_path('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi)
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=False)
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=True)         
np.einsum('oij,imj,mjkn,lnk,plk->op',phi,B,Suu,B,phi, optimize=path[0])
In my testing the second 2 run at the same speed.  For a small problem optimize=False is faster, presumably because the analysis and rearranging takes time.  For a large problem, with a larger theoretical speedup, the actual speedup for True can be larger than than theory.  Presumably memory management is slowing down the False case.
The theoretical speedup is just that, an estimate based just on FLOPS count.  That will be true only to the extent that FLOPS dominate the calculation.
You could also time the path calc.  The size of the problem will determine whether its time is a small or large part of the total time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With