I'd like to do a hypot2 calculation on a 16-bit processor.
The standard formula is c = sqrt((a * a) + (b * b))
. The problem with this is with large inputs it overflows. E.g. 200 and 250, multiply 200 * 200 to get 90,000 which is higher than the max signed value of 32,767, so it overflows, as does b, the numbers are added and the result may as well be useless; it might even signal an error condition because of a negative sqrt.
In my case, I'm dealing with 32-bit numbers, but 32-bit multiply on my processor is very fast, about 4 cycles. I'm using a dsPIC microcontroller. I'd rather not have to multiply with 64-bit numbers because that's wasting precious memory and undoubtedly will be slower. Additionally I only have sqrt for 32-bit numbers, so 64-bit numbers would require another function. So how can I compute a hypot when the values may be large?
Please note I can only really use integer math for this. Using anything like floating point math incurs a speed hit which I'd rather avoid. My processor has a fast integer/fixed point atan2 routine, about 130 cycles; could I use this to compute the hypotenuse length?
Depending on how much accuracy you need you may be able to avoid the squares and the square root operation. There is a section on this topic in Understanding Digital Signal Processing by Rick Lyons (section 10.2, "High-Speed Vector-Magnitude Approximation", starting at page 400 in my edition).
The approximation is essentially:
magnitude = alpha * min + beta * max
where max and min are the maximum and minimum absolute values of the real and imaginary components, and alpha and beta are two constants which are chosen to give a reasonable error distribution over the range of interest. These constants can be represented as fractions with power of 2 divisors to keep the arithemtic simple/efficient. In the book he suggests alpha = 15/16, beta = 15/32, and you can then simplify the formula to:
magnitude = (15 / 16) * (max + min / 2)
which might be implemented as follows using integer operations:
magnitude = 15 * (max + min / 2) / 16
and of course we can use shifts for the divides:
magnitude = (15 * (max + (min >> 1))) >> 4
Error is +/- 5% over a quadrant.
More information on this technique here: http://www.dspguru.com/dsp/tricks/magnitude-estimator
This is taken verbatim from this @John D. Cook blog post, hence CW:
Here’s how to compute
sqrt(x*x + y*y)
without risking overflow.
max = maximum(|x|, |y|)
min = minimum(|x|, |y|)
r = min / max
return max*sqrt(1 + r*r)
If @John D. Cook comes along and posts this you should give him the accept :)
Since you essentially can't do any multiplications without overflow you're likely going to lose some precision.
To get the numbers into an acceptable range, pull out some factor x
and use
c = x*sqrt( (a/x)*(a/x) + (b/x)*(b/x) )
If x
is a common factor, you won't lose precision, but if it's not, you will lose precision.
Update: Even better, given that you can do some mild work with 64-bit numbers, with just one 64-bit addition, you could do the rest of this problem in 32-bits with only a tiny loss of accuracy. To do this: do the two 32-bit multiplications to give you two 64-bit numbers, add these, and then bit shift as needed to get the sum back down to 32-bits before taking the square root. If you always bit shift by 2 bits, then just multiply the final result by 2^(half the number of bit shifts), based on the rule above. The truncation should only cause a very small loss of accuracy, no more than 2^31, or 0.00000005% error.
Aniko and John, it seems to me that you haven't addressed the OP's problem. If a and b are integers, then a*a + b*b is likely to overflow, because integer operations are being performed. The obvious solution is to convert a and b to floating-point values before computing a*a + b*b. But the OP hasn't let us know what language we should use, so we're a bit stuck.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With