I need to optimize a script that makes heavy use of computing L1 norm of vectors. As we know L1 norm in this case is just a sum of absolute values. When timing how fast numpy is in this task I found something weird: addition of all vector elements is about 3 times faster than taking absolute value of every element of the vector. This is a surprising result, as addition is pretty complex in comparison to taking absolute value, which only requires zeroing every 32-th bit of a datablock (assuming float32).
Why is that addition is 3x faster than a simple bitwise operation?
import numpy as np
a = np.random.rand(10000000)
%timeit np.sum(a)
13.9 ms ± 87.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit np.abs(a)
41.2 ms ± 92.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
There are several things to consider here. sum
returns a scalar abs
returns an array. So even if adding two numbers and taking the absolute had the same speed abs
would be slower because it needs to create the array. And it has to process twice as many elements (reading from input + writing to output).
So you cannot infer from these timings anything about the speed of addition vs. bitwise operation.
You could however check if it's faster to add something to each value of an array vs. taking the absolute of each value
%timeit a + 0.1
9 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit abs(a)
9.98 ms ± 532 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Or compare sum + memory allocation vs. taking the absolute
%timeit np.full_like(a, 1); np.sum(a)
13.4 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit abs(a)
9.64 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Just in case you want to make computing the norm faster, you could try numba (or Cython, or writing a C or Fortran routine yourself), that way you avoid any memory allocations:
import numba as nb
@nb.njit
def sum_of_abs(arr):
sum_ = 0
for item in arr:
sum_ += abs(item)
return sum_
sum_of_abs(a) # one call for the jitter to kick in
%timeit sum_of_abs(a)
# 2.44 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With