Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is numpy.absolute() so slow?

I need to optimize a script that makes heavy use of computing L1 norm of vectors. As we know L1 norm in this case is just a sum of absolute values. When timing how fast numpy is in this task I found something weird: addition of all vector elements is about 3 times faster than taking absolute value of every element of the vector. This is a surprising result, as addition is pretty complex in comparison to taking absolute value, which only requires zeroing every 32-th bit of a datablock (assuming float32).

Why is that addition is 3x faster than a simple bitwise operation?

import numpy as np

a = np.random.rand(10000000)

%timeit np.sum(a)
13.9 ms ± 87.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit np.abs(a)
41.2 ms ± 92.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
like image 797
Lugi Avatar asked Oct 01 '17 13:10

Lugi


1 Answers

There are several things to consider here. sum returns a scalar abs returns an array. So even if adding two numbers and taking the absolute had the same speed abs would be slower because it needs to create the array. And it has to process twice as many elements (reading from input + writing to output).

So you cannot infer from these timings anything about the speed of addition vs. bitwise operation.

You could however check if it's faster to add something to each value of an array vs. taking the absolute of each value

%timeit a + 0.1
9 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit abs(a)
9.98 ms ± 532 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Or compare sum + memory allocation vs. taking the absolute

%timeit np.full_like(a, 1); np.sum(a)
13.4 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit abs(a) 
9.64 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Just in case you want to make computing the norm faster, you could try numba (or Cython, or writing a C or Fortran routine yourself), that way you avoid any memory allocations:

import numba as nb

@nb.njit
def sum_of_abs(arr):
    sum_ = 0
    for item in arr:
        sum_ += abs(item)
    return sum_

sum_of_abs(a)  # one call for the jitter to kick in
%timeit sum_of_abs(a)
# 2.44 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 135
MSeifert Avatar answered Sep 17 '22 02:09

MSeifert