Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CUDA C best practices: unsigned vs signed optimization

Tags:

c

cuda

In the CUDA C Best Practices Guide there is a small section about using signed and unsigned integers.

In the C language standard, unsigned integer overflow semantics are well defined, whereas signed integer overflow causes undefined results. Therefore, the compiler can optimize more aggressively with signed arithmetic than it can with unsigned arithmetic. This is of particular note with loop counters: since it is common for loop counters to have values that are always positive, it may be tempting to declare the counters as unsigned. For slightly better performance, however, they should instead be declared as signed.

For example, consider the following code:

    for (i = 0; i < n; i++) {  
         out[i] = in[offset + stride*i];  
    }

Here, the sub-expression stride*i could overflow a 32-bit integer, so if i is declared as unsigned, the overflow semantics prevent the compiler from using some optimizations that might otherwise have applied, such as strength reduction. If instead i is declared as signed, where the overflow semantics are undefined, the compiler has more leeway to use these optimizations.

The first two sentences in particular confuse me. If the semantics of unsigned values are well defined and signed values can produce undefined results, how is it the compiler can produce better code for the latter?

like image 619
Barry Brown Avatar asked Dec 31 '12 20:12

Barry Brown


2 Answers

The text shows this example:

for (i = 0; i < n; i++) {  
     out[i] = in[offset + stride*i];  
}

It also mentions "strength reduction". The compiler is allowed to replace this with the following "pseudo-optimised-C" code:

tmp = offset;
for (i = 0; i < n; i++) {  
     out[i] = in[tmp];
     tmp += stride;
}

Now, imagine a processor that only supports floating point numbers (and integers as a subset). tmp would be of type "very large number".

Now, the C standard says that computations involving unsigned operands can never overflow, but instead are reduced modulo the largest value + 1. That means that in the case of unsigned i the compiler has to do this:

tmp = offset;
for (i = 0; i < n; i++) {  
     out[i] = in[tmp];
     tmp += stride;
     if (tmp > UINT_MAX)
     {
         tmp -= UINT_MAX + 1;
     }
}

But in the case of signed integer the compiler can do whatever it wants. It doesn't need to check for overflow - if it does overflow then it's the developer's problem (it could cause an exception, or produce erroneous values). So the code can be faster.

like image 165
Omri Barel Avatar answered Sep 24 '22 03:09

Omri Barel


Its because the definition of C limits what the compiler writer can do in the case of the unsigned integers. There is more leeway to fool around with what happens when signed integers overflow. The compiler writers have more room to move, so to speak.

That's the way I read it.

like image 25
Lee Meador Avatar answered Sep 22 '22 03:09

Lee Meador