std::norm(std::complex) uses square root instead of fast implementation

Question

in c++ the norm of a complex numer c is defined as abs(c)^2 . this means its re(c)^2+im(z)^2.

this is the implementation:

template<bool>
struct _Norm_helper
{
  template<typename _Tp>
    static inline _Tp _S_do_it(const complex<_Tp>& __z)
    {
      const _Tp __x = __z.real();
      const _Tp __y = __z.imag();
      return __x * __x + __y * __y;
    }
};

template<>
struct _Norm_helper<true>
{
  template<typename _Tp>
    static inline _Tp _S_do_it(const complex<_Tp>& __z)
    {
      _Tp __res = std::abs(__z);
      return __res * __res;
    }
};

why would anyone want to use the second implementation?

the first one is clearly faster, because it doesnt use abs, where sqrt is involved.

ivg · Accepted Answer

If we will look into the implementation, we will find the answer there,

  // 26.2.7/5: norm(__z) returns the squared magnitude of __z.
  //     As defined, norm() is -not- a norm is the common mathematical
  //     sens used in numerics.  The helper class _Norm_helper<> tries to
  //     distinguish between builtin floating point and the rest, so as
  //     to deliver an answer as close as possible to the real value.
  template<bool>
    struct _Norm_helper
    {
      template<typename _Tp>
        static inline _Tp _S_do_it(const complex<_Tp>& __z)
        {
          const _Tp __x = __z.real();
          const _Tp __y = __z.imag();
          return __x * __x + __y * __y;
        }
    };

  template<>
    struct _Norm_helper<true>
    {
      template<typename _Tp>
        static inline _Tp _S_do_it(const complex<_Tp>& __z)
        {
          _Tp __res = std::abs(__z);
          return __res * __res;
        }
    };

  template<typename _Tp>
    inline _Tp
    norm(const complex<_Tp>& __z)
    {
      return _Norm_helper<__is_floating<_Tp>::__value 
         && !_GLIBCXX_FAST_MATH>::_S_do_it(__z);
    }

So the second implementation is called when norm is applied to a value of a builtin floating-point type (which is float, double, long double, or __float128 as per GCC 4.8.1) and if the -fast-math option is not set. This is done to conform with the standard definition where norm is defined as the squared magnitude of z.

Due to the rounding errors, z.real()*z.real() + z.imag()*z.imag() is not equal abs(z)*abs(z), therefore the first version will be inconsistent with the specification wording (which probably indicates that there is a problem with the specification). To make it easier to understand, why the wording matters, consider the code that expects that norm(x) / abs(x) = x. Which is, of course, a bad code, but the standard in some sense guaranteed that this should be true.

However, once FAST_MATH is set or when complex is specialized to a non-builtin type, the standard doesn't have its power anymore (since it clearly says that the behavior is undefined) and the implementation is falling to the first implementation which is arguably¹ faster and more precise.

¹⁾) it actually depends on many factors (like whether builtin intrinsics are used) and yada, yada, yada, so let's take this claim with a grain of salt.

user5428643 · Answer

Look up the similar function std::hypot on cppreference.com, which says "Computes the square root of the sum of the squares of x and y, without undue overflow or underflow at intermediate stages of the computation."

(I'm guessing that std::abs is usually just a wrapper around std::hypot).

This is the key: it's all about numerical stability. Consider the case y=0, and x>0 being either very large or very small, for the floating point representation.

Thus if x is very large, so that x^2 = +inf, then just doing sqrt(x^2 + y^2) would overflow and give +inf. If x is very small, so that x^2 = 0, you'd get 0.

(Always remembering that we're talking about floating point representations).

In either case, it is very different from the correct answer x.

The correct algorithm is: consider, e.g., the case x > y > 0. Let r = y/x and calculate instead x.sqrt( 1 + r^2), which is much better behaved numerically; notice that 0 < r < 1. In general you'd consider the larger of |x| and |y| and divide it out, taking care of when x,y are both zero.

Any good std::abs implementation which cares about accuracy rather than speed is doing something similar to above I bet, NOT doing sqrt(x^2 + y^2) at all.

You can do some simple numerical analysis (if you know a bit of calculus and Taylor series, and/or the Binomial Expansion for power p=1/2): some algebraic calculations to work out the likely sizes of the errors will show that the naive sqrt(x^2 + y^2) formula is more error prone.

std::norm(std::complex) uses square root instead of fast implementation

Tags:

c++

tly

2 Answers

ivg

user5428643

Recent Activity

Donate For Us

std::norm(std::complex) uses square root instead of fast implementation

Tags:

c++

tly

2 Answers

ivg

user5428643

Related questions

Recent Activity

Donate For Us