Is there an "enhanced" numpy/scipy dot method?

Problem

I would like to compute the following using numpy or scipy:

Y = A**T * Q * A

where A is a m x n matrix, A**T is the transpose of A and Q is an m x m diagonal matrix.

Since Q is a diagonal matrix I store only its diagonal elements as a vector.

Ways of solving for `Y`

Currently I can think of two ways of how to calculate Y:

Y = np.dot(np.dot(A.T, np.diag(Q)), A) and
Y = np.dot(A.T * Q, A).

Clearly option 2 is better than option 1 since no real matrix has to be created with diag(Q) (if this is what numpy really does...)
However, both methods suffer from the defect of having to allocate more memory than there really is necessary since A.T * Q and np.dot(A.T, np.diag(Q)) have to be stored along with A in order to calculate Y.

Question

Is there a method in numpy/scipy that would eliminate the unnecessary allocation of extra memory where you would only pass two matrices A and B (in my case B is A.T) and a weighting vector Q along with it?

266

asked Feb 28 '12 08:02

Woltan

3 Answers

(w/r/t the last sentence of the OP: i am not aware of such a numpy/scipy method but w/r/t the Question in the OP Title (i.e., improving NumPy dot performance) what's below should be of some help. In other words, my answer is directed to improving performance of most of the steps comprising your function for Y).

First, this should give you a noticeable boost over the vanilla NumPy dot method:

>>> from scipy.linalg import blas as FB >>> vx = FB.dgemm(alpha=1., a=v1, b=v2, trans_b=True)

Note that the two arrays, v1, v2 are both in C_FORTRAN order

You can access the byte order of a NumPy array through an array's flags attribute like so:

>>> c = NP.ones((4, 3)) >>> c.flags       C_CONTIGUOUS : True          # refers to C-contiguous order       F_CONTIGUOUS : False         # fortran-contiguous       OWNDATA : True       MASKNA : False       OWNMASKNA : False       WRITEABLE : True       ALIGNED : True       UPDATEIFCOPY : False

to change the order of one of the arrays so both are aligned, just call the NumPy array constructor, pass in the array and set the appropriate order flag to True

>>> c = NP.array(c, order="F")  >>> c.flags       C_CONTIGUOUS : False       F_CONTIGUOUS : True       OWNDATA : True       MASKNA : False       OWNMASKNA : False       WRITEABLE : True       ALIGNED : True       UPDATEIFCOPY : False

You can further optimize by exploiting array-order alignment to reduce excess memory consumption caused by copying the original arrays.

But why are the arrays copied before being passed to dot?

The dot product relies on BLAS operations. These operations require arrays stored in C-contiguous order--it's this constraint that causes the arrays to be copied.

On the other hand, the transpose does not effect a copy, though unfortunately returns the result in Fortran order:

Therefore, to remove the performance bottleneck, you need to eliminate the predicate array-copying step; to do that just requires passing both arrays to dot in C-contiguous order*.

So to calculate dot(A.T., A) without making an extra copy:

>>> import scipy.linalg.blas as FB >>> vx = FB.dgemm(alpha=1.0, a=A.T, b=A.T, trans_b=True)

In sum, the expression just above (along with the predicate import statement) can substitute for dot, to supply the same functionality but better performance

you can bind that expression to a function like so:

>>> super_dot = lambda v, w: FB.dgemm(alpha=1., a=v.T, b=w.T, trans_b=True)

answered Oct 09 '22 11:10

doug

I just wanted to put that up on SO, but this pull request should be helpful and remove the need for a separate function for numpy.dot https://github.com/numpy/numpy/pull/2730 This should be available in numpy 1.7

In the meantime, I used the example above to write a function that can replace numpy dot, whatever the order of your arrays are, and make the right call to fblas.dgemm. http://pastebin.com/M8TfbURi

Hope this helps,

answered Oct 09 '22 11:10

LeSchef

numpy.einsum is what you're looking for:

numpy.einsum('ij, i, ik -> jk', A, Q, A)

This shall not need any additional memory (though usually einsum works slowlier than BLAS operations)

answered Oct 09 '22 09:10

Alleo

Related questions
                            
                                How to create an optimizer in Tensorflow
                            
                                Arrow keys no longer work in Python shell after upgrading Mac OS to Sierra
                            
                                How does one train multiple models in a single script in TensorFlow when there are GPUs present?
                            
                                Issue building cx_Oracle - libclntsh.so.11.1 => not found
                            
                                How to run python macros in LibreOffice?
                            
                                Tensorflow and Multiprocessing: Passing Sessions
                            
                                How can I use matplotlib.pyplot in a docker container?
                            
                                Splitting a string into an iterator
                            
                                Why does Django call it "views.py" instead of controller? [duplicate]
                            
                                python mysqldb multiple cursors for one connection
                            
                                Python: sharing common code among a family of scripts
                            
                                How to include external library with python wheel package
                            
                                Can Anaconda be packaged for a portable zero-configuration install?
                            
                                How can I get Selenium Web Driver to wait for an element to be accessible, not just present?
                            
                                Interrupt (pause) running Python program in pdb?
                            
                                Python center string using format specifier
                            
                                How do I unit testing my GUI program with Python and PyQt?
                            
                                Read a large csv into a sparse pandas dataframe in a memory efficient way
                            
                                "MetaClass", "__new__", "cls" and "super" - what is the mechanism exactly?
                            
                                How do I uninstall a Python module (“egg”) that I installed with easy_install?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there an "enhanced" numpy/scipy dot method?

Tags:

python

math

numpy

scipy

Problem

Ways of solving for `Y`

Question

Woltan

People also ask

3 Answers

doug

LeSchef

Alleo

Recent Activity

Donate For Us

Is there an "enhanced" numpy/scipy dot method?

Tags:

python

math

numpy

scipy

Problem

Ways of solving for Y

Question

Woltan

People also ask

3 Answers

doug

LeSchef

Alleo

Related questions

Recent Activity

Donate For Us

Ways of solving for `Y`