Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Numpy fusing multiply and add to avoid wasting memory

Is it possible to multiply two ndarray A, and B and add the result to C, without creating a large intermediate array for A times B?

Numpy has the out keyword parameter for the case of C = A times B:

numpy.multiply(A, B, out=C)

How about the case of C += A times B?

like image 626
hamster on wheels Avatar asked Jul 19 '17 20:07

hamster on wheels


2 Answers

Numpy only supports operations one at a time. With that said, there are several workarounds.

In place operations

The most simple solution is to use in-place operations via += and *=

import numpy as np

n = 100
b = 5.0

x = np.random.rand(n)
y = np.random.rand(n)

z = b * x
z += y

BLAS

You can access the underlying BLAS programs and apply them manually. Sadly, there is no multiply add instruction, but there is the "AXPY" instruction, which performs

y <- a * x + y

This can be called via:

import scipy

axpy = scipy.linalg.blas.get_blas_funcs('axpy', arrays=(x, y))
axpy(x, y, n, b)

Numexpr

Another option is to use some package like numexpr which allows you to compile expressions:

import numexpr

z = numexpr.evaluate('b * x + y')

Theano

Recently several machine-learning packages have started supporting compiled expressions, one such package is theano. You could do something like:

import theano

x = theano.tensor.vector()         # declare variable
y = theano.tensor.vector()         # declare variable

out = b * x + y                    # build symbolic expression
f = theano.function([x, y], out)   # compile function

z = f(x, y)
like image 158
Jonas Adler Avatar answered Nov 15 '22 11:11

Jonas Adler


I compared different variants and found that you're not going wrong with SciPy's BLAS interface

scipy.linalg.blas.daxpy(x, y, len(x), a)

enter image description here


Code to reproduce the plot:

import numexpr
import numpy as np
import perfplot
import scipy.linalg
import theano

a = 1.36

# theano preps
x = theano.tensor.vector()
y = theano.tensor.vector()
out = a * x + y
f = theano.function([x, y], out)


def setup(n):
    x = np.random.rand(n)
    y = np.random.rand(n)
    return x, y


def manual_axpy(data):
    x, y = data
    return a * x + y


def manual_axpy_inplace(data):
    x, y = data
    out = a * x
    out += y
    return out


def scipy_axpy(data):
    x, y = data
    n = len(x)
    axpy = scipy.linalg.blas.get_blas_funcs("axpy", arrays=(x, y))
    axpy(x, y, n, a)
    return y


def scipy_daxpy(data):
    x, y = data
    return scipy.linalg.blas.daxpy(x, y, len(x), a)


def numpexpr_evaluate(data):
    x, y = data
    return numexpr.evaluate("a * x + y")


def theano_function(data):
    x, y = data
    return f(x, y)


b = perfplot.bench(
    setup=setup,
    kernels=[
        manual_axpy,
        manual_axpy_inplace,
        scipy_axpy,
        scipy_daxpy,
        numpexpr_evaluate,
        theano_function,
    ],
    n_range=[2 ** k for k in range(24)],
    equality_check=None,
    xlabel="len(x), len(y)",
)
# b.save("out.png")
b.show()
like image 23
Nico Schlömer Avatar answered Nov 15 '22 11:11

Nico Schlömer