Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent truncation of unsigned bitfield integer expressions between C++ and C in different compilers

Edit 2:

I was debugging a strange test failure when a function previously residing in a C++ source file but moved into a C file verbatim, started to return incorrect results. The MVE below allows to reproduce the problem with GCC. However, when I, on a whim, compiled the example with Clang (and later with VS), I got a different result! I cannot figure out whether to treat this as a bug in one of the compilers, or as manifestation of undefined result allowed by C or C++ standard. Strangely, none of the compilers gave me any warnings about the expression.

The culprit is this expression:

ctl.b.p52 << 12;

Here, p52 is typed as uint64_t; it is also a part of a union (see control_t below). The shift operation does not lose any data as the result still fits into 64 bits. However, then GCC decides to truncate the result to 52 bits if I use C compiler! With C++ compiler, all 64 bits of result are preserved.

To illustrate this, the example program below compiles two functions with identical bodies, and then compares their results. c_behavior() is placed in a C source file and cpp_behavior() in a C++ file, and main() does the comparison.

Repository with the example code: https://github.com/atakua/c-cpp-bitfields

Header common.h defines a union of 64-bit wide bitfields and integer and declares two functions:

#ifndef COMMON_H
#define COMMON_H

#include <stdint.h>

typedef union control {
        uint64_t q;
        struct {
                uint64_t a: 1;
                uint64_t b: 1;
                uint64_t c: 1;
                uint64_t d: 1;
                uint64_t e: 1;
                uint64_t f: 1;
                uint64_t g: 4;
                uint64_t h: 1;
                uint64_t i: 1;
                uint64_t p52: 52;
        } b;
} control_t;

#ifdef __cplusplus
extern "C" {
#endif

uint64_t cpp_behavior(control_t ctl);
uint64_t c_behavior(control_t ctl);

#ifdef __cplusplus
}
#endif

#endif // COMMON_H

The functions have identical bodies, except that one is treated as C and another as C++.

c-part.c:

#include <stdint.h>
#include "common.h"
uint64_t c_behavior(control_t ctl) {
    return ctl.b.p52 << 12;
}

cpp-part.cpp:

#include <stdint.h>
#include "common.h"
uint64_t cpp_behavior(control_t ctl) {
    return ctl.b.p52 << 12;
}

main.c:

#include <stdio.h>
#include "common.h"

int main() {
    control_t ctl;
    ctl.q = 0xfffffffd80236000ull;

    uint64_t c_res = c_behavior(ctl);
    uint64_t cpp_res = cpp_behavior(ctl);
    const char *announce = c_res == cpp_res? "C == C++" : "OMG C != C++";
    printf("%s\n", announce);

    return c_res == cpp_res? 0: 1;
}

GCC shows the difference between the results they return:

$ gcc -Wpedantic main.c c-part.c cpp-part.cpp

$ ./a.exe
OMG C != C++

However, with Clang C and C++ behave identically and as expected:

$ clang -Wpedantic main.c c-part.c cpp-part.cpp

$ ./a.exe
C == C++

With Visual Studio I get the same result as with Clang:

C:\Users\user\Documents>cl main.c c-part.c cpp-part.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24234.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.c
c-part.c
Generating Code...
Compiling...
cpp-part.cpp
Generating Code...
Microsoft (R) Incremental Linker Version 14.00.24234.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:main.exe
main.obj
c-part.obj
cpp-part.obj

C:\Users\user\Documents>main.exe
C == C++

I tried the examples on Windows, even though the original problem with GCC was discovered on Linux.

like image 605
Grigory Rechistov Avatar asked Mar 17 '20 08:03

Grigory Rechistov


2 Answers

The problem seems to be specific to gcc's 32-bit code generator in C mode:

You can compare the assembly code using Godbolt's Compiler Explorer

Here is the source code for this test:

#include <stdint.h>

typedef union control {
    uint64_t q;
    struct {
        uint64_t a: 1;
        uint64_t b: 1;
        uint64_t c: 1;
        uint64_t d: 1;
        uint64_t e: 1;
        uint64_t f: 1;
        uint64_t g: 4;
        uint64_t h: 1;
        uint64_t i: 1;
        uint64_t p52: 52;
    } b;
} control_t;

uint64_t test(control_t ctl) {
    return ctl.b.p52 << 12;
}

The output in C mode (flags -xc -O2 -m32)

test:
        push    esi
        push    ebx
        mov     ebx, DWORD PTR [esp+16]
        mov     ecx, DWORD PTR [esp+12]
        mov     esi, ebx
        shr     ebx, 12
        shr     ecx, 12
        sal     esi, 20
        mov     edx, ebx
        pop     ebx
        or      esi, ecx
        mov     eax, esi
        shld    edx, esi, 12
        pop     esi
        sal     eax, 12
        and     edx, 1048575
        ret

The problem is the last instruction and edx, 1048575 that clips the 12 most significant bits.

The output in C++ mode is identical except for the last instruction:

test(control):
        push    esi
        push    ebx
        mov     ebx, DWORD PTR [esp+16]
        mov     ecx, DWORD PTR [esp+12]
        mov     esi, ebx
        shr     ebx, 12
        shr     ecx, 12
        sal     esi, 20
        mov     edx, ebx
        pop     ebx
        or      esi, ecx
        mov     eax, esi
        shld    edx, esi, 12
        pop     esi
        sal     eax, 12
        ret

The output in 64-bit mode is much simpler and correct, yet different for the C and C++ compilers:

#C code:
test:
        movabs  rax, 4503599627366400
        and     rax, rdi
        ret

# C++ code:
test(control):
        mov     rax, rdi
        and     rax, -4096
        ret

You should file a bug report on the gcc bug tracker.

like image 171
chqrlie Avatar answered Dec 08 '22 13:12

chqrlie


C and C++ treat the types of bit-field members differently.

C 2018 6.7.2.1 10 says:

A bit-field is interpreted as having a signed or unsigned integer type consisting of the specified number of bits…

Observe this is not specific about the type—it is some integer type—and it does not say the type is the type that was used to declare the bit-field, as in the uint64_t a : 1; shown in the question. This apparently leaves it open to the implementation to choose the type.

C++ 2017 draft n4659 12.2.4 [class.bit] 1 says, of a bit-field declaration:

… The bit-field attribute is not part of the type of the class member…

This implies that, in a declaration such as uint64_t a : 1;, the : 1 is not part of the type of the class member a, so the type is as if it were uint64_t a;, and thus the type of a is uint64_t.

So it appears GCC treats a bit-field in C as some integer type 32-bits or narrower if it fits and a bit-field in C++ as its declared type, and this does not appear to violate the standards.

like image 27
Eric Postpischil Avatar answered Dec 08 '22 13:12

Eric Postpischil