I have 2 matrix 100kx200 and 200x100k if they were small matrix I would just use numpy dot product <pre class="prettyprint"><code>sum(a.dot(b), axis = 0) </code></pre> however the matrix is too big, and also I can't use loops is there a smart way for doing this?

A possible optimization is <pre class="prettyprint"><code>>>> numpy.sum(a @ b, axis=0) array([ 1.83633615, 18.71643672, 15.26981078, -46.33670382, 13.30276476]) >>> numpy.sum(a, axis=0) @ b array([ 1.83633615, 18.71643672, 15.26981078, -46.33670382, 13.30276476]) </code></pre> Computing <code>a @ b</code> requires 10k×200×10k operations, while summing the rows first will reduce the multiplication to 1×200×10k operations, giving a 10k× improvement. This is mainly due to recognizing <pre class="prettyprint"><code> numpy.sum(x, axis=0) == [1, 1, ..., 1] @ x => numpy.sum(a @ b, axis=0) == [1, 1, ..., 1] @ (a @ b) == ([1, 1, ..., 1] @ a) @ b == numpy.sum(a, axis=0) @ b </code></pre> Similar for the other axis. <pre class="prettyprint"><code>>>> numpy.sum(a @ b, axis=1) array([ 2.8794171 , 9.12128399, 14.52009991, -8.70177811, -15.0303783 ]) >>> a @ numpy.sum(b, axis=1) array([ 2.8794171 , 9.12128399, 14.52009991, -8.70177811, -15.0303783 ]) </code></pre> (Note: <code>x @ y</code> is equivalent to <code>x.dot(y)</code> for 2D matrixes and 1D vectors on Python 3.5+ with numpy 1.10.0+) <hr> <pre class="prettyprint"><code>$ INITIALIZATION='import numpy;numpy.random.seed(0);a=numpy.random.randn(1000,200);b=numpy.random.rand(200,1000)' $ python3 -m timeit -s "$INITIALIZATION" 'numpy.einsum("ij,jk->k", a, b)' 10 loops, best of 3: 87.2 msec per loop $ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a@b, axis=0)' 100 loops, best of 3: 12.8 msec per loop $ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a, axis=0)@b' 1000 loops, best of 3: 300 usec per loop </code></pre> <hr> Illustration: <pre class="prettyprint"><code>In [235]: a = np.random.rand(3,3) array([[ 0.465, 0.758, 0.641], [ 0.897, 0.673, 0.742], [ 0.763, 0.274, 0.485]]) In [237]: b = np.random.rand(3,2) array([[ 0.303, 0.378], [ 0.039, 0.095], [ 0.192, 0.668]]) </code></pre> Now, if we simply do <code>a @ b</code>, we would need 18 multiply and 6 addition ops. On the other hand, if we do <code>np.sum(a, axis=0) @ b</code> we would only need 6 multiply and 2 addition ops. An improvement of 3x because we had 3 rows in <code>a</code>. As for OP's case, this should give 10k times improvement over simple <code>a @ b</code> computation since he has 10k rows in <code>a</code>.

There are two <code>sum-reductions</code> happening - One from the marix-multilication with <code>np.dot</code>, and then with the explicit <code>sum</code>. We could use <code>np.einsum</code> to do both of those in one go, like so - <pre class="prettyprint"><code>np.einsum('ij,jk->k',a,b) </code></pre> Sample run - <pre class="prettyprint"><code>In [27]: a = np.random.rand(3,4) In [28]: b = np.random.rand(4,3) In [29]: np.sum(a.dot(b), axis = 0) Out[29]: array([ 2.70084316, 3.07448582, 3.28690401]) In [30]: np.einsum('ij,jk->k',a,b) Out[30]: array([ 2.70084316, 3.07448582, 3.28690401]) </code></pre> Runtime test - <pre class="prettyprint"><code>In [45]: a = np.random.rand(1000,200) In [46]: b = np.random.rand(200,1000) In [47]: %timeit np.sum(a.dot(b), axis = 0) 100 loops, best of 3: 5.5 ms per loop In [48]: %timeit np.einsum('ij,jk->k',a,b) 10 loops, best of 3: 71.8 ms per loop </code></pre> Sadly, doesn't look like we are doing any better with <code>np.einsum</code>. <hr> For changing to <code>np.sum(a.dot(b), axis = 1)</code>, just swap the output string notation there - <code>np.einsum('ij,jk->i',a,b)</code>, like so - <pre class="prettyprint"><code>In [42]: np.sum(a.dot(b), axis = 1) Out[42]: array([ 3.97805141, 3.2249661 , 1.85921549]) In [43]: np.einsum('ij,jk->i',a,b) Out[43]: array([ 3.97805141, 3.2249661 , 1.85921549]) </code></pre>

Some quick time tests using the idea I added to Divakar's answer: <pre class="prettyprint"><code>In [162]: a = np.random.rand(1000,200) In [163]: b = np.random.rand(200,1000) In [174]: timeit c1=np.sum(a.dot(b), axis=0) 10 loops, best of 3: 27.7 ms per loop In [175]: timeit c2=np.sum(a,axis=0).dot(b) 1000 loops, best of 3: 432 µs per loop In [176]: timeit c3=np.einsum('ij,jk->k',a,b) 10 loops, best of 3: 170 ms per loop In [177]: timeit c4=np.einsum('j,jk->k', np.einsum('ij->j', a), b) 1000 loops, best of 3: 353 µs per loop In [178]: timeit np.einsum('ij->j', a) @b 1000 loops, best of 3: 304 µs per loop </code></pre> <code>einsum</code> is actually faster than <code>np.sum</code>! <pre class="prettyprint"><code>In [180]: timeit np.einsum('ij->j', a) 1000 loops, best of 3: 173 µs per loop In [181]: timeit np.sum(a,0) 1000 loops, best of 3: 312 µs per loop </code></pre> For larger arrays the <code>einsum</code> advantage decreases <pre class="prettyprint"><code>In [183]: a = np.random.rand(100000,200) In [184]: b = np.random.rand(200,100000) In [185]: timeit np.einsum('ij->j', a) @b 10 loops, best of 3: 51.5 ms per loop In [186]: timeit c2=np.sum(a,axis=0).dot(b) 10 loops, best of 3: 59.5 ms per loop </code></pre>

Row Sum of a dot product for huge matrix in python

Tags:

python

numpy

matrix-multiplication

dot-product

I have 2 matrix 100kx200 and 200x100k if they were small matrix I would just use numpy dot product

sum(a.dot(b), axis = 0)

however the matrix is too big, and also I can't use loops is there a smart way for doing this?

468

asked Mar 09 '17 09:03

Aya Abdelsalam

3 Answers

A possible optimization is

>>> numpy.sum(a @ b, axis=0)
array([  1.83633615,  18.71643672,  15.26981078, -46.33670382,  13.30276476])
>>> numpy.sum(a, axis=0) @ b
array([  1.83633615,  18.71643672,  15.26981078, -46.33670382,  13.30276476])

Computing a @ b requires 10k×200×10k operations, while summing the rows first will reduce the multiplication to 1×200×10k operations, giving a 10k× improvement.

This is mainly due to recognizing

   numpy.sum(x, axis=0) == [1, 1, ..., 1] @ x
=> numpy.sum(a @ b, axis=0) == [1, 1, ..., 1] @ (a @ b)
                            == ([1, 1, ..., 1] @ a) @ b
                            == numpy.sum(a, axis=0) @ b

Similar for the other axis.

>>> numpy.sum(a @ b, axis=1)
array([  2.8794171 ,   9.12128399,  14.52009991,  -8.70177811, -15.0303783 ])
>>> a @ numpy.sum(b, axis=1)
array([  2.8794171 ,   9.12128399,  14.52009991,  -8.70177811, -15.0303783 ])

(Note: x @ y is equivalent to x.dot(y) for 2D matrixes and 1D vectors on Python 3.5+ with numpy 1.10.0+)

$ INITIALIZATION='import numpy;numpy.random.seed(0);a=numpy.random.randn(1000,200);b=numpy.random.rand(200,1000)'

$ python3 -m timeit -s "$INITIALIZATION" 'numpy.einsum("ij,jk->k", a, b)'
10 loops, best of 3: 87.2 msec per loop

$ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a@b, axis=0)'
100 loops, best of 3: 12.8 msec per loop

$ python3 -m timeit -s "$INITIALIZATION" 'numpy.sum(a, axis=0)@b'
1000 loops, best of 3: 300 usec per loop

Illustration:

In [235]: a = np.random.rand(3,3)
array([[ 0.465,  0.758,  0.641],
       [ 0.897,  0.673,  0.742],
       [ 0.763,  0.274,  0.485]])

In [237]: b = np.random.rand(3,2)
array([[ 0.303,  0.378],
       [ 0.039,  0.095],
       [ 0.192,  0.668]])

Now, if we simply do a @ b, we would need 18 multiply and 6 addition ops. On the other hand, if we do np.sum(a, axis=0) @ b we would only need 6 multiply and 2 addition ops. An improvement of 3x because we had 3 rows in a. As for OP's case, this should give 10k times improvement over simple a @ b computation since he has 10k rows in a.

164

answered Oct 18 '22 01:10

kennytm

There are two sum-reductions happening - One from the marix-multilication with np.dot, and then with the explicit sum.

We could use np.einsum to do both of those in one go, like so -

np.einsum('ij,jk->k',a,b)

Sample run -

In [27]: a = np.random.rand(3,4)

In [28]: b = np.random.rand(4,3)

In [29]: np.sum(a.dot(b), axis = 0)
Out[29]: array([ 2.70084316,  3.07448582,  3.28690401])

In [30]: np.einsum('ij,jk->k',a,b)
Out[30]: array([ 2.70084316,  3.07448582,  3.28690401])

Runtime test -

In [45]: a = np.random.rand(1000,200)

In [46]: b = np.random.rand(200,1000)

In [47]: %timeit np.sum(a.dot(b), axis = 0)
100 loops, best of 3: 5.5 ms per loop

In [48]: %timeit np.einsum('ij,jk->k',a,b)
10 loops, best of 3: 71.8 ms per loop

Sadly, doesn't look like we are doing any better with np.einsum.

For changing to np.sum(a.dot(b), axis = 1), just swap the output string notation there - np.einsum('ij,jk->i',a,b), like so -

In [42]: np.sum(a.dot(b), axis = 1)
Out[42]: array([ 3.97805141,  3.2249661 ,  1.85921549])

In [43]: np.einsum('ij,jk->i',a,b)
Out[43]: array([ 3.97805141,  3.2249661 ,  1.85921549])

answered Oct 18 '22 01:10

Divakar

Some quick time tests using the idea I added to Divakar's answer:

In [162]: a = np.random.rand(1000,200)
In [163]: b = np.random.rand(200,1000)

In [174]: timeit c1=np.sum(a.dot(b), axis=0)
10 loops, best of 3: 27.7 ms per loop

In [175]: timeit c2=np.sum(a,axis=0).dot(b)
1000 loops, best of 3: 432 µs per loop

In [176]: timeit c3=np.einsum('ij,jk->k',a,b)
10 loops, best of 3: 170 ms per loop

In [177]: timeit c4=np.einsum('j,jk->k', np.einsum('ij->j', a), b)
1000 loops, best of 3: 353 µs per loop

In [178]: timeit np.einsum('ij->j', a) @b
1000 loops, best of 3: 304 µs per loop

einsum is actually faster than np.sum!

In [180]: timeit np.einsum('ij->j', a)
1000 loops, best of 3: 173 µs per loop
In [181]: timeit np.sum(a,0)
1000 loops, best of 3: 312 µs per loop

For larger arrays the einsum advantage decreases

In [183]: a = np.random.rand(100000,200)
In [184]: b = np.random.rand(200,100000)
In [185]: timeit np.einsum('ij->j', a) @b
10 loops, best of 3: 51.5 ms per loop
In [186]: timeit c2=np.sum(a,axis=0).dot(b)
10 loops, best of 3: 59.5 ms per loop

answered Oct 18 '22 01:10

hpaulj

Related questions
                            
                                Interleave list with fixed element
                            
                                Return Custom 404 Error when resource not found in Django Rest Framework
                            
                                Incremental PCA on big data
                            
                                Pandas Standard Deviation returns NaN
                            
                                How would I override the perform_destroy method in django rest framework?
                            
                                Is it possible to append to an xarray.Dataset?
                            
                                Force delete of any previous test database (autoclobber) when running Django unit tests, eg, in PyCharm
                            
                                Can't install elementtree with pip
                            
                                Encoding JSON inside Flask template
                            
                                Python - Unable to modify the serializer.data dictionary in Django Rest Framework
                            
                                When using python os.rmdir, get PermissionError: [WinError 5] Access is denied
                            
                                Fill empty cells in column with value of other columns
                            
                                What is the Scala case class equivalent in PySpark?
                            
                                How to create GIN index on text array column in SQLAlchemy (with PostgreSQL and python)
                            
                                How can I hint that a type is comparable with typing
                            
                                Can numpy's argsort give equal element the same rank?
                            
                                How to grab headers in python selenium-webdriver
                            
                                Python: Error tokenizing data. C error: Calling read(nbytes) on source failed with input nzip file
                            
                                How to execute external script in the Django environment
                            
                                Django annotate specific keys in json fields

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Row Sum of a dot product for huge matrix in python

Tags:

python

numpy

matrix-multiplication

dot-product

Aya Abdelsalam

People also ask

3 Answers

kennytm

Divakar

hpaulj

Recent Activity

Donate For Us