Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do hypot2(x,y) calculation when numbers can overflow

Tags:

math

I'd like to do a hypot2 calculation on a 16-bit processor.

The standard formula is c = sqrt((a * a) + (b * b)). The problem with this is with large inputs it overflows. E.g. 200 and 250, multiply 200 * 200 to get 90,000 which is higher than the max signed value of 32,767, so it overflows, as does b, the numbers are added and the result may as well be useless; it might even signal an error condition because of a negative sqrt.

In my case, I'm dealing with 32-bit numbers, but 32-bit multiply on my processor is very fast, about 4 cycles. I'm using a dsPIC microcontroller. I'd rather not have to multiply with 64-bit numbers because that's wasting precious memory and undoubtedly will be slower. Additionally I only have sqrt for 32-bit numbers, so 64-bit numbers would require another function. So how can I compute a hypot when the values may be large?

Please note I can only really use integer math for this. Using anything like floating point math incurs a speed hit which I'd rather avoid. My processor has a fast integer/fixed point atan2 routine, about 130 cycles; could I use this to compute the hypotenuse length?

like image 860
Thomas O Avatar asked Dec 07 '10 20:12

Thomas O


4 Answers

Depending on how much accuracy you need you may be able to avoid the squares and the square root operation. There is a section on this topic in Understanding Digital Signal Processing by Rick Lyons (section 10.2, "High-Speed Vector-Magnitude Approximation", starting at page 400 in my edition).

The approximation is essentially:

magnitude = alpha * min + beta * max

where max and min are the maximum and minimum absolute values of the real and imaginary components, and alpha and beta are two constants which are chosen to give a reasonable error distribution over the range of interest. These constants can be represented as fractions with power of 2 divisors to keep the arithemtic simple/efficient. In the book he suggests alpha = 15/16, beta = 15/32, and you can then simplify the formula to:

magnitude = (15 / 16) * (max + min / 2)

which might be implemented as follows using integer operations:

magnitude = 15 * (max + min / 2) / 16

and of course we can use shifts for the divides:

magnitude = (15 * (max + (min >> 1))) >> 4

Error is +/- 5% over a quadrant.

More information on this technique here: http://www.dspguru.com/dsp/tricks/magnitude-estimator

like image 132
Paul R Avatar answered Sep 28 '22 07:09

Paul R


This is taken verbatim from this @John D. Cook blog post, hence CW:

Here’s how to compute sqrt(x*x + y*y) without risking overflow.

  1. max = maximum(|x|, |y|)
  2. min = minimum(|x|, |y|)
  3. r = min / max
  4. return max*sqrt(1 + r*r)

If @John D. Cook comes along and posts this you should give him the accept :)

like image 36
AakashM Avatar answered Sep 28 '22 08:09

AakashM


Since you essentially can't do any multiplications without overflow you're likely going to lose some precision.

To get the numbers into an acceptable range, pull out some factor x and use

c = x*sqrt( (a/x)*(a/x) + (b/x)*(b/x) ) 

If x is a common factor, you won't lose precision, but if it's not, you will lose precision.

Update: Even better, given that you can do some mild work with 64-bit numbers, with just one 64-bit addition, you could do the rest of this problem in 32-bits with only a tiny loss of accuracy. To do this: do the two 32-bit multiplications to give you two 64-bit numbers, add these, and then bit shift as needed to get the sum back down to 32-bits before taking the square root. If you always bit shift by 2 bits, then just multiply the final result by 2^(half the number of bit shifts), based on the rule above. The truncation should only cause a very small loss of accuracy, no more than 2^31, or 0.00000005% error.

like image 21
tom10 Avatar answered Sep 28 '22 06:09

tom10


Aniko and John, it seems to me that you haven't addressed the OP's problem. If a and b are integers, then a*a + b*b is likely to overflow, because integer operations are being performed. The obvious solution is to convert a and b to floating-point values before computing a*a + b*b. But the OP hasn't let us know what language we should use, so we're a bit stuck.

like image 41
TonyK Avatar answered Sep 28 '22 07:09

TonyK