Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::norm(std::complex) uses square root instead of fast implementation

Tags:

c++

in c++ the norm of a complex numer c is defined as abs(c)^2 . this means its re(c)^2+im(z)^2.

this is the implementation:

template<bool>
struct _Norm_helper
{
  template<typename _Tp>
    static inline _Tp _S_do_it(const complex<_Tp>& __z)
    {
      const _Tp __x = __z.real();
      const _Tp __y = __z.imag();
      return __x * __x + __y * __y;
    }
};

template<>
struct _Norm_helper<true>
{
  template<typename _Tp>
    static inline _Tp _S_do_it(const complex<_Tp>& __z)
    {
      _Tp __res = std::abs(__z);
      return __res * __res;
    }
};

why would anyone want to use the second implementation?

the first one is clearly faster, because it doesnt use abs, where sqrt is involved.

like image 488
tly Avatar asked Apr 30 '14 21:04

tly


2 Answers

If we will look into the implementation, we will find the answer there,

  // 26.2.7/5: norm(__z) returns the squared magnitude of __z.
  //     As defined, norm() is -not- a norm is the common mathematical
  //     sens used in numerics.  The helper class _Norm_helper<> tries to
  //     distinguish between builtin floating point and the rest, so as
  //     to deliver an answer as close as possible to the real value.
  template<bool>
    struct _Norm_helper
    {
      template<typename _Tp>
        static inline _Tp _S_do_it(const complex<_Tp>& __z)
        {
          const _Tp __x = __z.real();
          const _Tp __y = __z.imag();
          return __x * __x + __y * __y;
        }
    };

  template<>
    struct _Norm_helper<true>
    {
      template<typename _Tp>
        static inline _Tp _S_do_it(const complex<_Tp>& __z)
        {
          _Tp __res = std::abs(__z);
          return __res * __res;
        }
    };

  template<typename _Tp>
    inline _Tp
    norm(const complex<_Tp>& __z)
    {
      return _Norm_helper<__is_floating<_Tp>::__value 
         && !_GLIBCXX_FAST_MATH>::_S_do_it(__z);
    }

So the second implementation is called when norm is applied to a value of a builtin floating-point type (which is float, double, long double, or __float128 as per GCC 4.8.1) and if the -fast-math option is not set. This is done to conform with the standard definition where norm is defined as the squared magnitude of z.

Due to the rounding errors, z.real()*z.real() + z.imag()*z.imag() is not equal abs(z)*abs(z), therefore the first version will be inconsistent with the specification wording (which probably indicates that there is a problem with the specification). To make it easier to understand, why the wording matters, consider the code that expects that norm(x) / abs(x) = x. Which is, of course, a bad code, but the standard in some sense guaranteed that this should be true.

However, once FAST_MATH is set or when complex is specialized to a non-builtin type, the standard doesn't have its power anymore (since it clearly says that the behavior is undefined) and the implementation is falling to the first implementation which is arguably1 faster and more precise.


1)) it actually depends on many factors (like whether builtin intrinsics are used) and yada, yada, yada, so let's take this claim with a grain of salt.

like image 72
ivg Avatar answered Oct 04 '22 20:10

ivg


Look up the similar function std::hypot on cppreference.com, which says "Computes the square root of the sum of the squares of x and y, without undue overflow or underflow at intermediate stages of the computation."

(I'm guessing that std::abs is usually just a wrapper around std::hypot).

This is the key: it's all about numerical stability. Consider the case y=0, and x>0 being either very large or very small, for the floating point representation.

Thus if x is very large, so that x^2 = +inf, then just doing sqrt(x^2 + y^2) would overflow and give +inf. If x is very small, so that x^2 = 0, you'd get 0.

(Always remembering that we're talking about floating point representations).

In either case, it is very different from the correct answer x.

The correct algorithm is: consider, e.g., the case x > y > 0. Let r = y/x and calculate instead x.sqrt( 1 + r^2), which is much better behaved numerically; notice that 0 < r < 1. In general you'd consider the larger of |x| and |y| and divide it out, taking care of when x,y are both zero.

Any good std::abs implementation which cares about accuracy rather than speed is doing something similar to above I bet, NOT doing sqrt(x^2 + y^2) at all.

You can do some simple numerical analysis (if you know a bit of calculus and Taylor series, and/or the Binomial Expansion for power p=1/2): some algebraic calculations to work out the likely sizes of the errors will show that the naive sqrt(x^2 + y^2) formula is more error prone.

like image 23
user5428643 Avatar answered Oct 04 '22 19:10

user5428643