Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance drop in NumPy matrix-vector multiplication

Tags:

python

numpy

I've encountered some (mysterious?) performance issue on NumPy matrix-vector multiplication.

I wrote the following snippet to test the speed of matrix-vector multiplication:

import timeit
for i in range(90, 101):
    tm = timeit.repeat('np.matmul(a, b)', number = 10000,
        setup = 'import numpy as np; a, b = np.random.rand({0},{0}), np.random.rand({0})'.format(i))
    print(i, sum(tm) / 5)

In some machines, the result is normal:

90 0.08936462279998522
91 0.08872119059979014
92 0.09083068459967762
93 0.09311594780047017
94 0.09907015420012613
95 0.10136517100036144
96 0.10339414420013782
97 0.10627872140012187
98 0.1102267580001353
99 0.11277738099979615
100 0.11471197419996315

In some machines, the multiplication slowed down at size 96:

90 0.03618830284103751
91 0.03737151022069156
92 0.03295294055715203
93 0.02851409767754376
94 0.02677299762144685
95 0.028137388220056892
96 0.1916038074065
97 0.16719966367818415
98 0.18511182265356182
99 0.1806833743583411
100 0.17172936061397195

Some even slowed down by a factor of 1000:

90 0.04183819475583732
91 0.029678784403949977
92 0.02486871089786291
93 0.02882006801664829
94 0.028613184532150625
95 0.02956576123833656
96 31.16711748293601
97 27.803299666382372
98 31.368976181373
99 27.71114011341706
100 26.219610543036833

The Python / NumPy version is the same on all the machines I tested (3.7.2 / 1.16.2). The OS is also the same (Arch Linux).

What is the possible reason for this? And why this occurs at size 96?

like image 692
Colera Su Avatar asked Mar 08 '19 01:03

Colera Su


1 Answers

At 96 your test reaches some software/hardware problem: 96*96*96 = 884,736. Close to 1M and multiply by 8 bytes for float number: 7,077,888. Intel i5 processor has 6 MB L3 cache. My iMac has this type of processor and has this slow down problem at 96 size. The Intel® Core™ i5-7200U Processor has 3 MB L3 cache and doesn't have this problem. So, it could be the software algorithm not correctly working with 6 MB cache size.

like image 60
Alex Lopatin Avatar answered Nov 04 '22 10:11

Alex Lopatin