Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most efficient way of doing square root of sum of square of two numbers?

I am looking for the more efficient and shortest way of performing the square root of a sum of squares of two or more numbers. I am actually using numpy and this code:

np.sqrt(i**2+j**2)

That seems five time faster than:

np.sqrt(sum(np.square([i,j])))

(i and j are to numbers!)

I was wondering if there was already a built-in function more efficient to perform this very common task with even less code.

like image 601
G M Avatar asked Jun 28 '18 07:06

G M


People also ask

What is the best estimate for √ 2?

√2 is between 1.41 and 1.42. So, √2 ≈ 1.415.


2 Answers

For the case of i != j it is not possible to do this with np.linalg.norm, thus I recommend the following:

(i*i + j*j)**0.5

If i and j are single floats, this is about 5 times faster than np.sqrt(i**2+j**2). If i and j are numpy arrays, this is about 20% faster (due to replacing the square with i*i and j*j. If you do not replace the squares, the performance is equal to np.sqrt(i**2+j**2).
Some timings using single floats:

i = 23.7
j = 7.5e7
%timeit np.sqrt(i**2 + j**2)
# 1.63 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit (i*i + j*j)**0.5
# 336 ns ± 7.38 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit math.sqrt(i*i + j*j)
# 321 ns ± 8.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

math.sqrt is slightly faster than (i*i + j*j)**0.5, but this comes at the cost of losing flexibility: (i*i + j*j)**0.5 will work on single floats AND arrays, whereas math.sqrt will only work on scalars.

And some timings for medium-sized arrays:

i = np.random.rand(100000)
j = np.random.rand(100000)
%timeit np.sqrt(i**2 + j**2)
# 1.45 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit (i*i + j*j)**0.5
# 1.21 ms ± 78.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 109
JE_Muc Avatar answered Oct 10 '22 22:10

JE_Muc


Instead of optimizing this fairly simple function call, you could try to rewrite your program such that i and j are arrays instead of single numbers (assuming that you need to call the function on a lot of different inputs). See this small benchmark:

import numpy as np
i = np.arange(10000)
j = np.arange(10000)

%%timeit 
np.sqrt(i**2+j**2)
# 74.1 µs ± 2.74 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
for idx in range(len(i)):
    np.sqrt(i[idx]**2+j[idx]**2)
# 25.2 ms ± 1.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

As you can see, the first variant (using arrays of numbers as input) is ~300x faster than the second one using a python for loop. The reason for this is that in the first example, all computation is performed by numpy (which is implemented in c internally and therefore really fast), whereas in the second example, numpy code and regular python code (the for loop) interleave, making the execution much slower.

If you really want to improve the performance of your program, I'd suggest rewriting it so that you can perform your function once on two numpy arrays instead of calling it for each pair of numbers.

like image 20
Felix Avatar answered Oct 10 '22 22:10

Felix