Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How did I get a value larger than 8 bits in size from an 8-bit integer?

This is a compiler bug.

Although getting impossible results for undefined behaviour is a valid consequence, there is actually no undefined behaviour in your code. What's happening is that the compiler thinks the behaviour is undefined, and optimises accordingly.

If c is defined as int8_t, and int8_t promotes to int, then c-- is supposed to perform the subtraction c - 1 in int arithmetic and convert the result back to int8_t. The subtraction in int does not overflow, and converting out-of-range integral values to another integral type is valid. If the destination type is signed, the result is implementation-defined, but it must be a valid value for the destination type. (And if the destination type is unsigned, the result is well-defined, but that does not apply here.)


A compiler can have bugs which are other than nonconformances to the standard, because there are other requirements. A compiler should be compatible with other versions of itself. It may also be expected to be compatible in some ways with other compilers, and also to conform to some beliefs about behavior that are held by the majority of its user base.

In this case, it appears to be a conformance bug. The expression c-- should manipulate c in a way similar to c = c - 1. Here, the value of c on the right is promoted to type int, and then the subtraction takes place. Since c is in the range of int8_t, this subtraction will not overflow, but it may produce a value which is out of the range of int8_t. When this value is assigned, a conversion takes place back to the type int8_t so the result fits back into c. In the out-of-range case, the conversion has an implementation-defined value. But a value out of the range of int8_t is not a valid implementation-defined value. An implementation cannot "define" that an 8 bit type suddenly holds 9 or more bits. For the value to be implementation-defined means that something in the range of int8_t is produced, and the program continues. The C standard thereby allows for behaviors such as saturation arithmetic (common on DSP's) or wrap-around (mainstream architectures).

The compiler is using a wider underlying machine type when manipulating values of small integer types like int8_t or char. When arithmetic is performed, results which are out of range of the small integer type can be captured reliably in this wider type. To preserve the externally visible behavior that the variable is an 8 bit type, the wider result has to be truncated into the 8 bit range. Explicit code is required to do that since the machine storage locations (registers) are wider than 8 bits and happy with the larger values. Here, the compiler neglected to normalize the value and simply passed it to printf as is. The conversion specifier %i in printf has no idea that the argument originally came from int8_t calculations; it is just working with an int argument.


I can't fit this in a comment, so I'm posting it as an answer.

For some very odd reason, the -- operator happens to be the culprit.

I tested the code posted on Ideone and replaced c-- with c = c - 1 and the values remained within the range [-128 ... 127]:

c: -123
c: -124
c: -125
c: -126
c: -127
c: -128 // about to overflow
c: 127  // woop
c: 126
c: 125
c: 124
c: 123
c: 122

Freaky ey? I don't know much about what the compiler does to expressions like i++ or i--. It's likely promoting the return value to an int and passing it. That's the only logical conclusion I can come up with because you ARE in fact getting values that cannot fit into 8-bits.


I guess that the underlying hardware is still using a 32-bit register to hold that int8_t. Since the specification does not impose a behaviour for overflow, the implementation does not check for overflow and allows larger values to be stored as well.


If you mark the local variable as volatile you are forcing to use memory for it and consequently obtain the expected values within the range.


The assembler code reveals the problem:

:loop
mov esi, ebx
xor eax, eax
mov edi, OFFSET FLAT:.LC2   ;"c: %i\n"
sub ebx, 1
call    printf
cmp ebx, -301
jne loop

mov esi, -45
mov edi, OFFSET FLAT:.LC2   ;"c: %i\n"
xor eax, eax
call    printf

EBX should be anded with FF post decrement, or only BL should be used with the remainder of EBX clear. Curious that it uses sub instead of dec. The -45 is flat-out mysterious. It's the bitwise inversion of 300 & 255 = 44. -45 = ~44. There's a connection somewhere.

It goes through a lot more work using c = c - 1:

mov eax, ebx
mov edi, OFFSET FLAT:.LC2   ;"c: %i\n"
add ebx, 1
not eax
movsx   ebp, al                 ;uses only the lower 8 bits
xor eax, eax
mov esi, ebp

It then uses only the low portion of RAX, so it's restricted to -128 thru 127. Compiler options "-g -O2".

With no optimization, it produces correct code:

movzx   eax, BYTE PTR [rbp-1]
sub eax, 1
mov BYTE PTR [rbp-1], al
movsx   edx, BYTE PTR [rbp-1]
mov eax, OFFSET FLAT:.LC2   ;"c: %i\n"
mov esi, edx

So it's a bug in the optimizer.