Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Undefined behaviour or not undefined behaviour

Considering the following code:

#include <stdio.h>

int main()
{
    char A = A ? 0[&A] & !A : A^A;
    putchar(A);
}

I'd like to ask, whether any undefined behaviour is observed in it or not.

Edit

Please note: the code intentionally uses 0[&A] & !A and NOT A & !A (see response below)

End edit

Taking the output ASM from g++ 6.3 (https://godbolt.org/g/4db6uO) we get (no optimizations were used):

main:
    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     BYTE PTR [rbp-1], 0
    movzx   eax, BYTE PTR [rbp-1]
    movsx   eax, al
    mov     edi, eax
    call    putchar
    mov     eax, 0
    leave
    ret

However clang gives a lot more code for the same thing (no optimizations again):

main:                                   # @main
    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     dword ptr [rbp - 4], 0
    cmp     byte ptr [rbp - 5], 0
    je      .LBB0_2
    movsx   eax, byte ptr [rbp - 5]
    cmp     byte ptr [rbp - 5], 0
    setne   cl
    xor     cl, -1
    and     cl, 1
    movzx   edx, cl
    and     eax, edx
    mov     dword ptr [rbp - 12], eax # 4-byte Spill
    jmp     .LBB0_3
.LBB0_2:
    movsx   eax, byte ptr [rbp - 5]
    movsx   ecx, byte ptr [rbp - 5]
    xor     eax, ecx
    mov     dword ptr [rbp - 12], eax # 4-byte Spill
.LBB0_3:
    mov     eax, dword ptr [rbp - 12] # 4-byte Reload
    mov     cl, al
    mov     byte ptr [rbp - 5], cl
    movsx   edi, byte ptr [rbp - 5]
    call    putchar
    mov     edi, dword ptr [rbp - 4]
    mov     dword ptr [rbp - 16], eax # 4-byte Spill
    mov     eax, edi
    add     rsp, 16
    pop     rbp
    ret

And Microsoft VC compiler gives:

EXTRN   _putchar:PROC
tv76 = -12                                          ; size = 4
tv69 = -8                                         ; size = 4
_A$ = -1                                                ; size = 1
_main   PROC
    push     ebp
    mov      ebp, esp
    sub      esp, 12              ; 0000000cH
    movsx    eax, BYTE PTR _A$[ebp]
    test     eax, eax
    je       SHORT $LN5@main
    movsx    ecx, BYTE PTR _A$[ebp]
    test     ecx, ecx
    jne      SHORT $LN3@main
    mov      DWORD PTR tv69[ebp], 1
    jmp      SHORT $LN4@main
$LN3@main:
    mov      DWORD PTR tv69[ebp], 0
$LN4@main:
    mov      edx, 1
    imul     eax, edx, 0
    movsx    ecx, BYTE PTR _A$[ebp+eax]
    and      ecx, DWORD PTR tv69[ebp]
    mov      DWORD PTR tv76[ebp], ecx
    jmp      SHORT $LN6@main
$LN5@main:
    movsx    edx, BYTE PTR _A$[ebp]
    movsx    eax, BYTE PTR _A$[ebp]
    xor      edx, eax
    mov      DWORD PTR tv76[ebp], edx
$LN6@main:
    mov      cl, BYTE PTR tv76[ebp]
    mov      BYTE PTR _A$[ebp], cl
    movsx    edx, BYTE PTR _A$[ebp]
    push     edx
    call     _putchar
    add      esp, 4
    xor      eax, eax
    mov      esp, ebp
    pop      ebp
    ret      0
_main   ENDP

But with optimizations we get so more cleaner code (gcc and clang):

main:                                   # @main
    push    rax
    mov     rsi, qword ptr [rip + stdout]
    xor     edi, edi
    call    _IO_putc
    xor     eax, eax
    pop     rcx
    ret

And a sort of mysterious VC code (seems the VC compiler can't understand a joke ... and it just does not precalculate the right hand side).

EXTRN   _putchar:PROC
_A$ = -1                                                ; size = 1
_main   PROC                                      ; COMDAT
    push     ecx
    mov      cl, BYTE PTR _A$[esp+4]
    test     cl, cl
    je       SHORT $LN3@main
    mov      al, cl
    xor      al, 1
    and      cl, al
    jmp      SHORT $LN4@main
$LN3@main:
    xor      cl, cl
$LN4@main:
    movsx    eax, cl
    push     eax
    call     _putchar
    xor      eax, eax
    pop      ecx
    pop      ecx
    ret      0
_main   ENDP

Some Warnings:

  1. You should not write code like this. This is definitely bad coding style and never should go into a serious application. Just for fun.

Some Explanations:

  1. I look for undefined behaviour since the value of A is used in its initialization. Again: You should not do this.
  2. However the way the expression is built up, both parts of the code will yield 0, as the compilers

So I am in this dilemma right now whether is this UB or not UB.

like image 239
Ferenc Deak Avatar asked Apr 12 '26 18:04

Ferenc Deak


1 Answers

First of all, if char corresponds to unsigned char, a char cannot have a trap representation; however if char corresponds to signed char it can have trap representations. Since using a trap representation has undefined behaviour, it is more interesting to modify the code to use unsigned char:

unsigned char A = A ? 0[&A] & !A : A^A;
putchar(A);

Initially I believed that there isn't any undefined behaviour in C. The question is is A uninitialized in a manner that has undefined behaviour, and the answer is "no", because, although it is a local variable with automatic storage duration, it has its address taken, so it must reside in memory, and its type is char, therefore its value is unspecified but specifically it cannot be a trap representation.

The C11 Appendix J.2. specifies that the following has undefined behaviour:

An lvalue designating an object of automatic storage duration that could have been declared with the register storage class is used in a context that requires the value of the designated object, but the object is uninitialized. (6.3.2.1).

with 6.3.2.1p2 saying that

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

Since the address of A is taken, it could not have been declared with the register storage class, and therefore its use does not has undefined behaviour as per this 6.3.2.1p2; instead it would have an unspecified yet valid char value; chars do not have trap representations.

However, the problem is that there is not any requirement that A must yield the same unspecified value all over, as unspecified value is

valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

And the answer to C11 Defect Report 451 seems to consider this to have undefined behaviour after all, saying that the result of using an indeterminate value (even with types that have no trap representations, such as unsigned char) in arithmetic expressions will also mean that the result will have unstable values and that use of such values in library functions will have undefined behaviour.

Thus:

unsigned char A = A ? 0[&A] & !A : A^A;

doesn't invoke undefined behaviour as such but A is still initialized with an indeterminate value, and use of such an indeterminate value in call to a library function putchar(A) should be considered as having undefined behaviour:

Proposed Committee Response

  • The answer to question 1 is "yes", an uninitialized value under the conditions described can appear to change its value.
  • The answer to question 2 is that any operation performed on indeterminate values will have an indeterminate value as a result.
  • The answer to question 3 is that library functions will exhibit undefined behavior when used on indeterminate values.
  • These answers are appropriate for all types that do not have trap representations.
  • This viewpoint reaffirms the C99 DR260 position.
  • The committee agrees that this area would benefit from a new definition of something akin to a "wobbly" value and that this should be considered in any subsequent revision of this standard.
  • The committee also notes that padding bytes within structures are possibly a distinct form of "wobbly" representation.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!