In <code>numpy</code>, what's the most efficient way to compute <code>x.T * x</code>, where <code>x</code> is a large (200,000 x 1000) dense <code>float32</code> matrix and <code>.T</code> is the transpose operator? For the avoidance of doubt, the result is 1000 x 1000. edit: In my original question I stated that <code>np.dot(x.T, x)</code> was taking hours. It turned out that I had some <code>NaNs</code> sneak into the matrix, and for some reason that was completely killing the performance of <code>np.dot</code> (any insights as to why?) This is now resolved, but the original question stands.

This may not be the answer you're looking for, but one way to speed it up considerably is to use a gpu instead of your cpu. If you have a decently powerful graphics card around, it'll outperform your cpu any day, even if your system is very well tuned. For nice integration with numpy, you could use theano (if your graphics card is made by nvidia). The calculation in the following code runs for me in couple of seconds (although I have a very powerful graphics card): <pre class="prettyprint"><code>$ THEANO_FLAGS=device=gpu0 python Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import theano Using gpu device 0: GeForce GTX 480 >>> from theano import tensor as T >>> import numpy >>> x = numpy.ones((200000, 1000), dtype=numpy.float32) >>> m = T.matrix() >>> mTm = T.dot(m.T, m) >>> f = theano.function([m], mTm) >>> f(x) array([[ 200000., 200000., 200000., ..., 200000., 200000., 200000.], [ 200000., 200000., 200000., ..., 200000., 200000., 200000.], [ 200000., 200000., 200000., ..., 200000., 200000., 200000.], ..., [ 200000., 200000., 200000., ..., 200000., 200000., 200000.], [ 200000., 200000., 200000., ..., 200000., 200000., 200000.], [ 200000., 200000., 200000., ..., 200000., 200000., 200000.]], dtype=float32) >>> r = f(x) >>> r.shape (1000, 1000) </code></pre> I was going to wait to find out how long <code>>>> numpy.dot(x.T, x)</code> took by way of comparison, but I got bored... You can also try PyCuda or PyOpenCL (if you don't have an nvidia graphics card), although I don't know if their numpy support is as straightforward.

numpy: compute x.T*x for a large matrix

Tags:

python

numpy

transpose

scipy

matrix-multiplication

In numpy, what's the most efficient way to compute x.T * x, where x is a large (200,000 x 1000) dense float32 matrix and .T is the transpose operator?

For the avoidance of doubt, the result is 1000 x 1000.

edit: In my original question I stated that np.dot(x.T, x) was taking hours. It turned out that I had some NaNs sneak into the matrix, and for some reason that was completely killing the performance of np.dot (any insights as to why?) This is now resolved, but the original question stands.

595

asked Dec 07 '10 10:12

NPE

1 Answers

This may not be the answer you're looking for, but one way to speed it up considerably is to use a gpu instead of your cpu. If you have a decently powerful graphics card around, it'll outperform your cpu any day, even if your system is very well tuned.

For nice integration with numpy, you could use theano (if your graphics card is made by nvidia). The calculation in the following code runs for me in couple of seconds (although I have a very powerful graphics card):

$ THEANO_FLAGS=device=gpu0 python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import theano
Using gpu device 0: GeForce GTX 480
>>> from theano import tensor as T
>>> import numpy
>>> x = numpy.ones((200000, 1000), dtype=numpy.float32)
>>> m = T.matrix() 
>>> mTm = T.dot(m.T, m)
>>> f = theano.function([m], mTm)
>>> f(x)
array([[ 200000.,  200000.,  200000., ...,  200000.,  200000.,  200000.],
       [ 200000.,  200000.,  200000., ...,  200000.,  200000.,  200000.],
       [ 200000.,  200000.,  200000., ...,  200000.,  200000.,  200000.],
       ..., 
       [ 200000.,  200000.,  200000., ...,  200000.,  200000.,  200000.],
       [ 200000.,  200000.,  200000., ...,  200000.,  200000.,  200000.],
       [ 200000.,  200000.,  200000., ...,  200000.,  200000.,  200000.]], dtype=float32)
>>> r = f(x)
>>> r.shape
(1000, 1000)

I was going to wait to find out how long >>> numpy.dot(x.T, x) took by way of comparison, but I got bored...

You can also try PyCuda or PyOpenCL (if you don't have an nvidia graphics card), although I don't know if their numpy support is as straightforward.

answered Oct 21 '22 11:10

Josh Bleecher Snyder

Related questions
                            
                                Reading from the serial port from C++ or Python on windows
                            
                                Game cross-compiling and packaging
                            
                                Python: Get values (objects) from a dictionary of objects in which one of the object's field matches a value (or condition)
                            
                                Python: write to file multiple times without open/close for each write
                            
                                How should I rewrite my database execute/commit to make it amenable to unit testing?
                            
                                python: find and replace numbers < 1 in text file
                            
                                How to add http headers in suds 0.3.6?
                            
                                sort a list of percentages
                            
                                Best way to strip out everything but text from a webpage?
                            
                                Python Deque appendleft with list
                            
                                Cannot Display an Image in Tkinter [duplicate]
                            
                                How to avoid writing the name of the module all the time when importing a module in python?
                            
                                Using xlrd to read Excel xls file containing Chinese and/or Hindi characters
                            
                                Reading Python source code to improve programming skills [closed]
                            
                                Python, using subprocess.Popen to make linux command line call? I'm getting "[Errno 2] No such file or directory"
                            
                                How to remove the unselected item in a select and radio input in Django
                            
                                Getting NppExec to understand path of the current file in Notepad++ (for Python scripts)
                            
                                Dynamic global variables assignment
                            
                                Python Multidimensional Arrays - most efficient way to count number of non-zero entries
                            
                                Unable to install pip: Permission denied error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With