Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python numpy methods/attributes faster than numpy functions?

I recently noticed that some numpy array attributes/methods seem to be significantly faster than the corresponding numpy functions. Example for np.conj(x) vs. x.conjugate():

import numpy as np
import time

np.random.seed(100)

t0_1 = 0
t0_2 = 0
for i in range(1000):
    a = np.random.rand(10000)
    t0 = time.time()
    b = np.conjugate(a)
    t0_1 += time.time() - t0; t0 = time.time()
    c = a.conjugate()
    t0_2 += time.time() - t0; t0 = time.time()

print(t0_1, t0_2)

# example output times: 0.01222848892211914 0.0008714199066162109

Even without proper benchmarks, it looks like there is a performance gain of more than a factor of 10. Similarly, it seems that also x.real, x.imag, x.max() and other basic methods are faster than the corresponding functions np.real(x), np.imag(x), np.max(x) etc.

Can somebody explain to me where the time saving comes from? Does it have to do with in-place operations vs. new array creation? Are there certain checks that the numpy functions do which are skipped for the array methods? Thank you in advance!

Update: Below is a simple comparison of computation times for several common numpy functions/methods, for float, complex and boolean arrays. The largest speed gain factors of methods over functions (float/complex/bool) appear to be for a.real (12/15/12), a.imag(70/15/26) and a.conj(80/15/33), as explained by the post of @hpaulj (imag and conj are not useful for real arrays though), and for a.sort (5/5/1.5) (my guess is that this is due to in-place operations), a.max/a.min (1.6 for bool) (again, max and min are not useful for bool arrays). Other speed gains are typically between 1.1 and 1.4. For a.argsort, a.std and a.__len__, the factors are often around 1, for a.__abs__ even below 1.

So it looks like except for a.real, a.imag and a.sort, the speed gains are often not too large, say 1.2. However, this may depend on array sizes, whether the array is (partially) sorted or not, etc.

import numpy as np
from IPython import get_ipython

ipython = get_ipython()

np.random.seed(1000)

asize = 10000
dtype_list = ['float', 'complex', 'bool']
for i in range(3):
    print(dtype_list[i])
    print('-----------------')
    if i == 0:
        a = np.random.rand(asize)
    elif i == 1:
        a = np.random.rand(asize) + 1j*np.random.rand(asize)
    elif i == 2:
        a = np.random.randint(2,size=asize).astype(bool)
    
    function_list = [np.real, np.imag, np.conj, np.sum, np.cumsum, np.prod, np.cumprod,
                     np.max, np.min, np.argmax, np.argmin, np.mean, np.var, np.std,
                     np.sort, np.argsort, np.all, np.any, np.abs, len]
    methatt_list = [a.real, a.imag, a.conj, a.sum, a.cumsum, a.prod, a.cumprod,
                    a.max, a.min, a.argmax, a.argmin, a.mean, a.var, a.std,
                    a.sort, a.argsort, a.all, a.any, a.__abs__, a.__len__]
    for j in range(len(function_list)):
        print(function_list[j].__name__)
        ipython.magic('timeit function_list[j](a)')
        if callable(methatt_list[j]):
            ipython.magic('timeit methatt_list[j]()')
        else:
            ipython.magic('timeit methatt_list[j]')
    print('')

# float
# -----------------
# real
# 740 ns ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 60.7 ns ± 0.226 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# imag
# 4.45 µs ± 36.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 60.9 ns ± 0.353 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# conjugate
# 9.64 µs ± 40.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 124 ns ± 0.238 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# sum
# 15.8 µs ± 101 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 11.8 µs ± 82.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# cumsum
# 42.4 µs ± 254 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 37.7 µs ± 38.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# prod
# 32.7 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 29 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# cumprod
# 51.5 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 47.1 µs ± 154 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amax
# 14.5 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 10.7 µs ± 61.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# amin
# 14.6 µs ± 90.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 10.7 µs ± 45.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# argmax
# 11.1 µs ± 15.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 8.62 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# argmin
# 11.5 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 8.76 µs ± 37 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# mean
# 23.5 µs ± 440 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 19.6 µs ± 569 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# var
# 78.6 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 73.3 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# std
# 86.7 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 81.9 µs ± 663 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# sort
# 659 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 141 µs ± 682 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argsort
# 156 µs ± 508 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 151 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# all
# 23.4 µs ± 41.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 17.7 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# any
# 23.4 µs ± 72.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 17.3 µs ± 67 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# absolute
# 7.1 µs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 7.25 µs ± 20.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# len
# 125 ns ± 0.17 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 117 ns ± 0.463 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# complex
# -----------------
# real
# 920 ns ± 1.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 61.1 ns ± 0.0517 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# imag
# 898 ns ± 0.792 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 61.3 ns ± 0.178 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# conjugate
# 18.1 µs ± 45.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 18.6 µs ± 7.75 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# sum
# 24 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 18.7 µs ± 97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# cumsum
# 44.8 µs ± 80.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 39.4 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# prod
# 99.6 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 95.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# cumprod
# 94.9 µs ± 245 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 89.7 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amax
# 41.3 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 37 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amin
# 41.7 µs ± 65.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 37.1 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argmax
# 27.4 µs ± 47.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 24.5 µs ± 77.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argmin
# 28.8 µs ± 28.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 25.5 µs ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# mean
# 32.2 µs ± 43.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 27.6 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# var
# 139 µs ± 844 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 135 µs ± 476 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# std
# 147 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 145 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# sort
# 774 µs ± 3.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 201 µs ± 145 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# argsort
# 277 µs ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 271 µs ± 123 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# all
# 37.9 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 31 µs ± 252 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# any
# 37.5 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 30.2 µs ± 11.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# absolute
# 217 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 216 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# len
# 121 ns ± 0.38 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 117 ns ± 1.23 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# bool
# -----------------
# real
# 726 ns ± 4.61 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 60.5 ns ± 0.0926 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# imag
# 1.55 µs ± 2.44 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 60.7 ns ± 0.123 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# conjugate
# 4.16 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 125 ns ± 0.339 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# sum
# 24.2 µs ± 82.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 19.3 µs ± 82.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# cumsum
# 48.2 µs ± 428 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 41.2 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# prod
# 29.2 µs ± 73.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 25.3 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# cumprod
# 53.7 µs ± 83.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 46.6 µs ± 136 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# amax
# 9.37 µs ± 93 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 5.81 µs ± 21.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# amin
# 9.16 µs ± 15.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 5.75 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# argmax
# 2.93 µs ± 8.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 589 ns ± 5.33 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# argmin
# 3.07 µs ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 622 ns ± 4.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# mean
# 33.5 µs ± 27.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 29.1 µs ± 286 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# var
# 111 µs ± 749 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 105 µs ± 735 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# std
# 117 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 113 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# sort
# 157 µs ± 407 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 105 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# argsort
# 115 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# 112 µs ± 925 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# all
# 8.26 µs ± 9.85 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 3.86 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# any
# 8.49 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 4 µs ± 30.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# absolute
# 1.52 µs ± 3.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# 1.72 µs ± 2.95 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
# len
# 122 ns ± 0.24 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
# 117 ns ± 0.279 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
like image 737
bproxauf Avatar asked Oct 24 '25 23:10

bproxauf


1 Answers

numpy functions often delegate the action to a method, if it exists. But they must also check that the argument is an array, and so on. ufuncs also have some extra 'baggage' that handles parameters like out, where. So time differences don't (necessarily) scale with array size.

In [400]: a = np.random.rand(10000)

Comparing conjugate:

In [404]: timeit np.conjugate(a)
10 µs ± 15.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [405]: timeit a.conjugate()
94.2 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

That ns time suggests that the method is taking some sort of shortcut. (I'll explore that later)

max time difference isn't as significant, which I can attribute to the function overhead:

In [406]: timeit np.max(a)
13.2 µs ± 16.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [407]: timeit a.max()
9.46 µs ± 79.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But let's test with a complex array, where conjugate isn't trivial

In [408]: ac = a+1j*a

Now the method and function time the same:

In [409]: timeit np.conjugate(ac)
18.2 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [410]: timeit ac.conjugate()
18.3 µs ± 10.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

The real attribute is still much faster. Looking at the python code for np.real I think the time difference is just due to the function wrapper.

In [411]: timeit np.real(ac)
743 ns ± 21.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [413]: timeit ac.real
129 ns ± 4.93 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The conjugate method for a float array just returns a view (or maybe the array itself). That accounts for its speed:

In [418]: a.__array_interface__['data']
Out[418]: (84672384, False)
In [419]: a.conjugate().__array_interface__['data']
Out[419]: (84672384, False)
In [420]: ac.__array_interface__['data']
Out[420]: (84992432, False)
In [421]: ac.conjugate().__array_interface__['data']
Out[421]: (85165216, False)

It's the array itself:

In [422]: id(a)
Out[422]: 140673862490512
In [423]: id(a.conjugate())
Out[423]: 140673862490512

np.real code:

def real(val):
    try:
        return val.real
    except AttributeError:
        return asanyarray(val).real
like image 89
hpaulj Avatar answered Oct 26 '25 12:10

hpaulj



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!