Considering the following code:
#include <stdio.h>
int main()
{
char A = A ? 0[&A] & !A : A^A;
putchar(A);
}
I'd like to ask, whether any undefined behaviour is observed in it or not.
Edit
Please note: the code intentionally uses 0[&A] & !A and NOT A & !A (see response below)
End edit
Taking the output ASM from g++ 6.3 (https://godbolt.org/g/4db6uO) we get (no optimizations were used):
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov BYTE PTR [rbp-1], 0
movzx eax, BYTE PTR [rbp-1]
movsx eax, al
mov edi, eax
call putchar
mov eax, 0
leave
ret
However clang gives a lot more code for the same thing (no optimizations again):
main: # @main
push rbp
mov rbp, rsp
sub rsp, 16
mov dword ptr [rbp - 4], 0
cmp byte ptr [rbp - 5], 0
je .LBB0_2
movsx eax, byte ptr [rbp - 5]
cmp byte ptr [rbp - 5], 0
setne cl
xor cl, -1
and cl, 1
movzx edx, cl
and eax, edx
mov dword ptr [rbp - 12], eax # 4-byte Spill
jmp .LBB0_3
.LBB0_2:
movsx eax, byte ptr [rbp - 5]
movsx ecx, byte ptr [rbp - 5]
xor eax, ecx
mov dword ptr [rbp - 12], eax # 4-byte Spill
.LBB0_3:
mov eax, dword ptr [rbp - 12] # 4-byte Reload
mov cl, al
mov byte ptr [rbp - 5], cl
movsx edi, byte ptr [rbp - 5]
call putchar
mov edi, dword ptr [rbp - 4]
mov dword ptr [rbp - 16], eax # 4-byte Spill
mov eax, edi
add rsp, 16
pop rbp
ret
And Microsoft VC compiler gives:
EXTRN _putchar:PROC
tv76 = -12 ; size = 4
tv69 = -8 ; size = 4
_A$ = -1 ; size = 1
_main PROC
push ebp
mov ebp, esp
sub esp, 12 ; 0000000cH
movsx eax, BYTE PTR _A$[ebp]
test eax, eax
je SHORT $LN5@main
movsx ecx, BYTE PTR _A$[ebp]
test ecx, ecx
jne SHORT $LN3@main
mov DWORD PTR tv69[ebp], 1
jmp SHORT $LN4@main
$LN3@main:
mov DWORD PTR tv69[ebp], 0
$LN4@main:
mov edx, 1
imul eax, edx, 0
movsx ecx, BYTE PTR _A$[ebp+eax]
and ecx, DWORD PTR tv69[ebp]
mov DWORD PTR tv76[ebp], ecx
jmp SHORT $LN6@main
$LN5@main:
movsx edx, BYTE PTR _A$[ebp]
movsx eax, BYTE PTR _A$[ebp]
xor edx, eax
mov DWORD PTR tv76[ebp], edx
$LN6@main:
mov cl, BYTE PTR tv76[ebp]
mov BYTE PTR _A$[ebp], cl
movsx edx, BYTE PTR _A$[ebp]
push edx
call _putchar
add esp, 4
xor eax, eax
mov esp, ebp
pop ebp
ret 0
_main ENDP
But with optimizations we get so more cleaner code (gcc and clang):
main: # @main
push rax
mov rsi, qword ptr [rip + stdout]
xor edi, edi
call _IO_putc
xor eax, eax
pop rcx
ret
And a sort of mysterious VC code (seems the VC compiler can't understand a joke ... and it just does not precalculate the right hand side).
EXTRN _putchar:PROC
_A$ = -1 ; size = 1
_main PROC ; COMDAT
push ecx
mov cl, BYTE PTR _A$[esp+4]
test cl, cl
je SHORT $LN3@main
mov al, cl
xor al, 1
and cl, al
jmp SHORT $LN4@main
$LN3@main:
xor cl, cl
$LN4@main:
movsx eax, cl
push eax
call _putchar
xor eax, eax
pop ecx
pop ecx
ret 0
_main ENDP
Some Warnings:
Some Explanations:
A is used in its initialization. Again: You should not do this.So I am in this dilemma right now whether is this UB or not UB.
First of all, if char corresponds to unsigned char, a char cannot have a trap representation; however if char corresponds to signed char it can have trap representations. Since using a trap representation has undefined behaviour, it is more interesting to modify the code to use unsigned char:
unsigned char A = A ? 0[&A] & !A : A^A;
putchar(A);
Initially I believed that there isn't any undefined behaviour in C. The question is is A uninitialized in a manner that has undefined behaviour, and the answer is "no", because, although it is a local variable with automatic storage duration, it has its address taken, so it must reside in memory, and its type is char, therefore its value is unspecified but specifically it cannot be a trap representation.
The C11 Appendix J.2. specifies that the following has undefined behaviour:
An lvalue designating an object of automatic storage duration that could have been declared with the register storage class is used in a context that requires the value of the designated object, but the object is uninitialized. (6.3.2.1).
with 6.3.2.1p2 saying that
If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
Since the address of A is taken, it could not have been declared with the register storage class, and therefore its use does not has undefined behaviour as per this 6.3.2.1p2; instead it would have an unspecified yet valid char value; chars do not have trap representations.
However, the problem is that there is not any requirement that A must yield the same unspecified value all over, as unspecified value is
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
And the answer to C11 Defect Report 451 seems to consider this to have undefined behaviour after all, saying that the result of using an indeterminate value (even with types that have no trap representations, such as unsigned char) in arithmetic expressions will also mean that the result will have unstable values and that use of such values in library functions will have undefined behaviour.
Thus:
unsigned char A = A ? 0[&A] & !A : A^A;
doesn't invoke undefined behaviour as such but A is still initialized with an indeterminate value, and use of such an indeterminate value in call to a library function putchar(A) should be considered as having undefined behaviour:
Proposed Committee Response
- The answer to question 1 is "yes", an uninitialized value under the conditions described can appear to change its value.
- The answer to question 2 is that any operation performed on indeterminate values will have an indeterminate value as a result.
- The answer to question 3 is that library functions will exhibit undefined behavior when used on indeterminate values.
- These answers are appropriate for all types that do not have trap representations.
- This viewpoint reaffirms the C99 DR260 position.
- The committee agrees that this area would benefit from a new definition of something akin to a "wobbly" value and that this should be considered in any subsequent revision of this standard.
- The committee also notes that padding bytes within structures are possibly a distinct form of "wobbly" representation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With