I am looking for the more efficient and shortest way of performing the square root of a sum of squares of two or more numbers. I am actually using numpy
and this code:
np.sqrt(i**2+j**2)
That seems five time faster than:
np.sqrt(sum(np.square([i,j])))
(i and j are to numbers!)
I was wondering if there was already a built-in function more efficient to perform this very common task with even less code.
√2 is between 1.41 and 1.42. So, √2 ≈ 1.415.
For the case of i != j
it is not possible to do this with np.linalg.norm
, thus I recommend the following:
(i*i + j*j)**0.5
If i
and j
are single floats, this is about 5 times faster than np.sqrt(i**2+j**2)
. If i
and j
are numpy arrays, this is about 20% faster (due to replacing the square with i*i
and j*j
. If you do not replace the squares, the performance is equal to np.sqrt(i**2+j**2)
.
Some timings using single floats:
i = 23.7
j = 7.5e7
%timeit np.sqrt(i**2 + j**2)
# 1.63 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit (i*i + j*j)**0.5
# 336 ns ± 7.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit math.sqrt(i*i + j*j)
# 321 ns ± 8.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
math.sqrt
is slightly faster than (i*i + j*j)**0.5
, but this comes at the cost of losing flexibility: (i*i + j*j)**0.5
will work on single floats AND arrays, whereas math.sqrt
will only work on scalars.
And some timings for medium-sized arrays:
i = np.random.rand(100000)
j = np.random.rand(100000)
%timeit np.sqrt(i**2 + j**2)
# 1.45 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (i*i + j*j)**0.5
# 1.21 ms ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Instead of optimizing this fairly simple function call, you could try to rewrite your program such that i
and j
are arrays instead of single numbers (assuming that you need to call the function on a lot of different inputs). See this small benchmark:
import numpy as np
i = np.arange(10000)
j = np.arange(10000)
%%timeit
np.sqrt(i**2+j**2)
# 74.1 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%%timeit
for idx in range(len(i)):
np.sqrt(i[idx]**2+j[idx]**2)
# 25.2 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
As you can see, the first variant (using arrays of numbers as input) is ~300x faster than the second one using a python for loop. The reason for this is that in the first example, all computation is performed by numpy (which is implemented in c internally and therefore really fast), whereas in the second example, numpy code and regular python code (the for loop) interleave, making the execution much slower.
If you really want to improve the performance of your program, I'd suggest rewriting it so that you can perform your function once on two numpy arrays instead of calling it for each pair of numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With