Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The `out` arguments in `numpy.einsum` can not work as expected

I have two piece codes. The first one is:

A = np.arange(3*4*3).reshape(3, 4, 3)
P = np.arange(1, 4)
A[:, 1:, :] = np.einsum('j, ijk->ijk', P, A[:, 1:, :])

and the result A is :

array([[[  0,   1,   2],
        [  6,   8,  10],
        [ 18,  21,  24],
        [ 36,  40,  44]],

       [[ 12,  13,  14],
        [ 30,  32,  34],
        [ 54,  57,  60],
        [ 84,  88,  92]],

       [[ 24,  25,  26],
        [ 54,  56,  58],
        [ 90,  93,  96],
        [132, 136, 140]]])

The second one is:

A = np.arange(3*4*3).reshape(3, 4, 3)
P = np.arange(1, 4)
np.einsum('j, ijk->ijk', P, A[:, 1:, :], out=A[:,1:,:])

and the result A is :

array([[[ 0,  1,  2],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0]],

       [[12, 13, 14],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0]],

       [[24, 25, 26],
        [ 0,  0,  0],
        [ 0,  0,  0],
        [ 0,  0,  0]]])

So the result is different. Here I want to use out to save memory. Is it a bug in numpy.einsum? Or I missed something?

By the way, my numpy version is 1.13.3.

like image 293
Huayi Wei Avatar asked Nov 26 '17 01:11

Huayi Wei


2 Answers

I haven't used this new out parameter before, but have worked with einsum in the past, and have a general idea of how it works (or at least used to).

It looks to me like it initializes the out array to zero before the start of iteration. That would account for all the 0s in the A[:,1:,:] block. If instead I initial separate out array, the desired values are inserted

In [471]: B = np.ones((3,4,3),int)
In [472]: np.einsum('j, ijk->ijk', P, A[:, 1:, :], out=B[:,1:,:])
Out[472]: 
array([[[  3,   4,   5],
        [ 12,  14,  16],
        [ 27,  30,  33]],

       [[ 15,  16,  17],
        [ 36,  38,  40],
        [ 63,  66,  69]],

       [[ 27,  28,  29],
        [ 60,  62,  64],
        [ 99, 102, 105]]])
In [473]: B
Out[473]: 
array([[[  1,   1,   1],
        [  3,   4,   5],
        [ 12,  14,  16],
        [ 27,  30,  33]],

       [[  1,   1,   1],
        [ 15,  16,  17],
        [ 36,  38,  40],
        [ 63,  66,  69]],

       [[  1,   1,   1],
        [ 27,  28,  29],
        [ 60,  62,  64],
        [ 99, 102, 105]]])

The Python portion of einsum doesn't tell me much, except how it decides to pass the out array to the c portion, (as one of the list of tmp_operands):

c_einsum(einsum_str, *tmp_operands, **einsum_kwargs)

I know that it sets up a c-api equivalent of np.nditer, using the str to define the axes and iterations.

It iterates something like this section in the iteration tutorial:

https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.nditer.html#reduction-iteration

Notice in particular the it.reset() step. That sets the out buffer to 0 prior to iterating. It then iterates over the elements of input arrays and the output array, writing the calculation values to the output element. Since it is doing a sum of products (e.g. out[:] += ...), it has to start with a clean slate.

I'm guessing a bit as to what is actually going on, but it seems logical to me that it should zero out the output buffer to start with. If that array is the same as one of the inputs, that will end up messing with the calculation.

So I don't think this approach will work and save you memory. It needs a clean buffer to accumulate the results in. Once that's done it, or you, can write the values back into A. But given the nature of a dot like product, you can't use the same array for input and for output.

In [476]: A[:,1:,:] = np.einsum('j, ijk->ijk', P, A[:, 1:, :])
In [477]: A
Out[477]: 
array([[[  0,   1,   2],
        [  3,   4,   5],
        [ 12,  14,  16],
        [ 27,  30,  33]],
        ....)
like image 88
hpaulj Avatar answered Jan 04 '23 02:01

hpaulj


In the C source code for einsum, there is a section that will take the array specified by out and do some zero-setting.

But in the Python source code for example, there are execution paths that call the tensordot function before ever descending the arguments to call c_einsum.

This means that some operations might be pre-computed (thus modifying your array A on some contraction passes) with tensordot, before any sub-array is ever set to zero by the zero-setter inside the C code for einsum.

Another way to put it is: on each pass at doing the next contraction operations, NumPy has many choices available to it. To use tensordot directly without getting into the C-level einsum code just yet? Or to prepare the arguments and pass to the C level (which will involve over-writing some sub-view of the output array with all zeros)? Or to re-order the operations and repeat the check?

Depending on the order it chooses for these optimizations, you can end up with unexpected all-zeros sub-arrays.

Best bet is to not try to be this clever and use the same array for the output. You say it is because you want to save memory. Yes, in some special cases an einsum operation might be do-able in-place. But it does not currently detect if this is the case and attempt to avoid the zero-setting.

And in a huge number of cases, over-writing into one of the input arrays during the middle of the overall operation would cause many problems, much like trying to append to a list you are directly looping over, etc.

like image 37
ely Avatar answered Jan 04 '23 00:01

ely