If I get a <code>bool</code> variable and set its second bit to 1, then variable evaluates to true and false at the same time. Compile the following code with gcc6.3 with <code>-g</code> option, (<code>gcc-v6.3.0/Linux/RHEL6.0-2016-x86_64/bin/g++ -g main.cpp -o mytest_d</code>) and run the executable. You get the following. How can T be equal to true and false at the same time? <pre class="prettyprint"><code> value bits ----- ---- T: 1 0001 after bit change T: 3 0011 T is true T is false </code></pre> This can happen when you call a function in a different language (say fortran) where true and false definition is different than C++. For fortran if any bits are not 0 then the value is true, if all bits are zero then the value is false. <pre class="prettyprint"><code>#include <iostream> #include <bitset> using namespace std; void set_bits_to_1(void* val){ char *x = static_cast<char *>(val); for (int i = 0; i<2; i++ ){ *x |= (1UL << i); } } int main(int argc,char *argv[]) { bool T = 3; cout <<" value bits " <<endl; cout <<" ----- ---- " <<endl; cout <<" T: "<< T <<" "<< bitset<4>(T)<<endl; set_bits_to_1(&T); bitset<4> bit_T = bitset<4>(T); cout <<"after bit change"<<endl; cout <<" T: "<< T <<" "<< bit_T<<endl; if (T ){ cout <<"T is true" <<endl; } if ( T == false){ cout <<"T is false" <<endl; } } </code></pre> /////////////////////////////////// // Fortran function that is not compatible with C++ when compiled with ifort. <pre class="prettyprint"><code> logical*1 function return_true() implicit none return_true = 1; end function return_true </code></pre>

In C++ the bit representation (and even the size) of a <code>bool</code> is implementation defined; generally it's implemented as a <code>char</code>-sized type taking 1 or 0 as possible values. If you set its value to anything different from the allowed ones (in this specific case by aliasing a <code>bool</code> through a <code>char</code> and modifying its bit representation), you are breaking the rules of the language, so anything can happen. In particular, it's explicitly specified in the standard that a "broken" <code>bool</code> may behave as both <code>true</code> and <code>false</code> (or neither <code>true</code> nor <code>false</code>) at the same time: <blockquote> Using a <code>bool</code> value in ways described by this International Standard as “undefined,” such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither <code>true</code> nor <code>false</code> </blockquote> (C++11, [basic.fundamental], note 47) <hr> In this particular case, you can see how it ended up in this bizarre situation: the first <code>if</code> gets compiled to <pre class="prettyprint"><code> movzx eax, BYTE PTR [rbp-33] test al, al je .L22 </code></pre> which loads <code>T</code> in <code>eax</code> (with zero extension), and skips the print if it's all zero; the next if instead is <pre class="prettyprint"><code> movzx eax, BYTE PTR [rbp-33] xor eax, 1 test al, al je .L23 </code></pre> The test <code>if(T == false)</code> is transformed to <code>if(T^1)</code>, which flips just the low bit. This would be ok for a valid <code>bool</code>, but for your "broken" one it doesn't cut it. Notice that this bizarre sequence is only generated at low optimization levels; at higher levels this is generally going to boil down to a zero/nonzero check, and a sequence like yours is likely to become a single test/conditional branch. You will get bizarre behavior anyway in other contexts, e.g. when summing <code>bool</code> values to other integers: <pre class="prettyprint"><code>int foo(bool b, int i) { return i + b; } </code></pre> becomes <pre class="prettyprint"><code>foo(bool, int): movzx edi, dil lea eax, [rdi+rsi] ret </code></pre> where <code>dil</code> is "trusted" to be 0/1. <hr> If your program is all C++, then the solution is simple: don't break <code>bool</code> values this way, avoid messing with their bit representation and everything will go well; in particular, even if you assign from an integer to a <code>bool</code> the compiler will emit the necessary code to make sure that the resulting value is a valid <code>bool</code>, so your <code>bool T = 3</code> is indeed safe, and <code>T</code> will end up with a <code>true</code> in its guts. If instead you need to interoperate with code written in other languages that may not share the same idea of what a <code>bool</code> is, just avoid <code>bool</code> for "boundary" code, and marshal it as an appropriately-sized integer. It will work in conditionals & co. just as fine. <hr> <h3>Update about the Fortran/interoperability side of the issue</h3> <blockquote> Disclaimer all I know of Fortran is what I read this morning on standard documents, and that I have some punched cards with Fortran listings that I use as bookmarks, so go easy on me. </blockquote> First of all, this kind of language interoperability stuff isn't part of the language standards, but of the platform ABI. As we are talking about Linux x86-64, the relevant document is the System V x86-64 ABI. First of all, nowhere is specified that the C <code>_Bool</code> type (which is defined to be the same as C++ <code>bool</code> at 3.1.2 note &dagger;) has any kind of compatibility with Fortran <code>LOGICAL</code>; in particular, at 9.2.2 table 9.2 specifies that "plain" <code>LOGICAL</code> is mapped to <code>signed int</code>. About <code>TYPE*N</code> types it says that <blockquote> The “<code>TYPE*N</code>” notation specifies that variables or aggregate members of type <code>TYPE</code> shall occupy <code>N</code> bytes of storage. </blockquote> (ibid.) There's no equivalent type explicitly specified for <code>LOGICAL*1</code>, and it's understandable: it's not even standard; indeed if you try to compile a Fortran program containing a <code>LOGICAL*1</code> in Fortran 95 compliant mode you get warnings about it, both by ifort <pre class="prettyprint"><code>./example.f90(2): warning #6916: Fortran 95 does not allow this length specification. [1] logical*1, intent(in) :: x ------------^ </code></pre> and by gfort <pre class="prettyprint"><code>./example.f90:2:13: logical*1, intent(in) :: x 1 Error: GNU Extension: Nonstandard type declaration LOGICAL*1 at (1) </code></pre> so the waters are already muddled; so, combining the two rules above, I'd go for <code>signed char</code> to be safe. However: the ABI also specifies: <blockquote> The values for type <code>LOGICAL</code> are <code>.TRUE.</code> implemented as 1 and <code>.FALSE.</code> implemented as 0. </blockquote> So, if you have a program that stores anything besides 1 and 0 in a <code>LOGICAL</code> value, you are already out of spec on the Fortran side! You say: <blockquote> A fortran <code>logical*1</code> has same representation as <code>bool</code>, but in fortran if bits are 00000011 it is <code>true</code>, in C++ it is undefined. </blockquote> This last statement is not true, the Fortran standard is representation-agnostic, and the ABI explicitly says the contrary. Indeed you can see this in action easily by checking the output of gfort for <code>LOGICAL</code> comparison: <pre class="prettyprint"><code>integer function logical_compare(x, y) logical, intent(in) :: x logical, intent(in) :: y if (x .eqv. y) then logical_compare = 12 else logical_compare = 24 end if end function logical_compare </code></pre> becomes <pre class="prettyprint"><code>logical_compare_: mov eax, DWORD PTR [rsi] mov edx, 24 cmp DWORD PTR [rdi], eax mov eax, 12 cmovne eax, edx ret </code></pre> You'll notice that there's a straight <code>cmp</code> between the two values, without normalizing them first (unlike <code>ifort</code>, that is more conservative in this regard). Even more interesting: regardless of what the ABI says, ifort by default uses a nonstandard representation for <code>LOGICAL</code>; this is explained in the <code>-fpscomp logicals</code> switch documentation, which also specifies some interesting details about <code>LOGICAL</code> and cross-language compatibility: <blockquote> Specifies that integers with a non-zero value are treated as true, integers with a zero value are treated as false. The literal constant .TRUE. has an integer value of 1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Intel Fortran releases before Version 8.0 and by Fortran PowerStation. The default is <code>fpscomp nologicals</code>, which specifies that odd integer values (low bit one) are treated as true and even integer values (low bit zero) are treated as false. The literal constant .TRUE. has an integer value of -1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Compaq Visual Fortran. The internal representation of LOGICAL values is not specified by the Fortran standard. Programs which use integer values in LOGICAL contexts, or which pass LOGICAL values to procedures written in other languages, are non-portable and may not execute correctly. Intel recommends that you avoid coding practices that depend on the internal representation of LOGICAL values. </blockquote> (emphasis added) Now, the internal representation of a <code>LOGICAL</code> normally shouldn't a problem, as, from what I gather, if you play "by the rules" and don't cross language boundaries you aren't going to notice. For a standard compliant program there's no "straight conversion" between <code>INTEGER</code> and <code>LOGICAL</code>; the only way I see you can shove an <code>INTEGER</code> into a <code>LOGICAL</code> seem to be <code>TRANSFER</code>, which is intrinsically non-portable and give no real guarantees, or the non-standard <code>INTEGER</code> <-> <code>LOGICAL</code> conversion on assignment. The latter one is documented by gfort to always result in nonzero -> <code>.TRUE.</code>, zero -> <code>.FALSE.</code>, and you can see that in all cases code is generated to make this happen (even though it's convoluted code in case of ifort with the legacy representation), so you cannot seem to shove an arbitrary integer into a <code>LOGICAL</code> in this way. <pre class="prettyprint"><code>logical*1 function integer_to_logical(x) integer, intent(in) :: x integer_to_logical = x return end function integer_to_logical </code></pre> <pre class="prettyprint"><code>integer_to_logical_: mov eax, DWORD PTR [rdi] test eax, eax setne al ret </code></pre> The reverse conversion for a <code>LOGICAL*1</code> is a straight integer zero-extension (gfort), so, to be honoring the contract in the documentation linked above, it's clearly expecting the <code>LOGICAL</code> value to be 0 or 1. But in general, the situation for these conversions is a bit of a mess, so I'd just stay away from them. <hr> So, long story short: avoid putting <code>INTEGER</code> data into <code>LOGICAL</code> values, as it is bad even in Fortran, and make sure to use the correct compiler flag to get the ABI-compliant representation for booleans, and interoperability with C/C++ should be fine. But to be extra safe, I'd just use plain <code>char</code> on the C++ side. Finally, from what I gather from the documentation, in ifort there is some builtin support for interoperability with C, including booleans; you may try to leverage it.

This is what happens when you violate your contract with both the language and the compiler. You probably heard somewhere that "zero is false", and "non-zero is true". That holds when you stick to the language's parameters, statically converting an <code>int</code> to <code>bool</code> or vice versa. It does not hold when you start messing with bit representations. In that case, you break your contract, and enter the realm of (at the very least) implementation-defined behaviour. Simply don't do that. It's not up to you how a <code>bool</code> is stored in memory. It's up to the compiler. If you want to change a <code>bool</code>'s value, either assign <code>true</code>/<code>false</code>, or assign an integer and use the proper conversion mechanisms provided by C++. <hr> The C++ standard used to actually give a specific call-out to how using <code>bool</code> in this manner is naughty and bad and evil ("Using a <code>bool</code> value in ways described by this document as 'undefined',such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither <code>true</code> nor <code>false</code>."), though it was removed in C++20 for editorial reasons.

Setting extra bits in a bool makes it true and false at the same time

Tags:

c++

boolean

undefined-behavior

abi

evaluation

If I get a bool variable and set its second bit to 1, then variable evaluates to true and false at the same time. Compile the following code with gcc6.3 with -g option, (gcc-v6.3.0/Linux/RHEL6.0-2016-x86_64/bin/g++ -g main.cpp -o mytest_d) and run the executable. You get the following.

How can T be equal to true and false at the same time?

       value   bits 
       -----   ---- 
    T:   1     0001
after bit change
    T:   3     0011
T is true
T is false

This can happen when you call a function in a different language (say fortran) where true and false definition is different than C++. For fortran if any bits are not 0 then the value is true, if all bits are zero then the value is false.

#include <iostream>
#include <bitset>

using namespace std;

void set_bits_to_1(void* val){
  char *x = static_cast<char *>(val);

  for (int i = 0; i<2; i++ ){
    *x |= (1UL << i);
  }
}

int main(int argc,char *argv[])
{

  bool T = 3;

  cout <<"       value   bits " <<endl;
  cout <<"       -----   ---- " <<endl;
  cout <<"    T:   "<< T <<"     "<< bitset<4>(T)<<endl;

  set_bits_to_1(&T);


  bitset<4> bit_T = bitset<4>(T);
  cout <<"after bit change"<<endl;
  cout <<"    T:   "<< T <<"     "<< bit_T<<endl;

  if (T ){
    cout <<"T is true" <<endl;
  }

  if ( T == false){
    cout <<"T is false" <<endl;
  }


}

/////////////////////////////////// // Fortran function that is not compatible with C++ when compiled with ifort.

       logical*1 function return_true()
         implicit none

         return_true = 1;

       end function return_true

546

asked May 29 '19 21:05

BY408

2 Answers

In C++ the bit representation (and even the size) of a bool is implementation defined; generally it's implemented as a char-sized type taking 1 or 0 as possible values.

If you set its value to anything different from the allowed ones (in this specific case by aliasing a bool through a char and modifying its bit representation), you are breaking the rules of the language, so anything can happen. In particular, it's explicitly specified in the standard that a "broken" bool may behave as both true and false (or neither true nor false) at the same time:

Using a bool value in ways described by this International Standard as “undefined,” such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false

(C++11, [basic.fundamental], note 47)

In this particular case, you can see how it ended up in this bizarre situation: the first if gets compiled to

    movzx   eax, BYTE PTR [rbp-33]
    test    al, al
    je      .L22

which loads T in eax (with zero extension), and skips the print if it's all zero; the next if instead is

    movzx   eax, BYTE PTR [rbp-33]
    xor     eax, 1
    test    al, al
    je      .L23

The test if(T == false) is transformed to if(T^1), which flips just the low bit. This would be ok for a valid bool, but for your "broken" one it doesn't cut it.

Notice that this bizarre sequence is only generated at low optimization levels; at higher levels this is generally going to boil down to a zero/nonzero check, and a sequence like yours is likely to become a single test/conditional branch. You will get bizarre behavior anyway in other contexts, e.g. when summing bool values to other integers:

int foo(bool b, int i) {
    return i + b;
}

becomes

foo(bool, int):
        movzx   edi, dil
        lea     eax, [rdi+rsi]
        ret

where dil is "trusted" to be 0/1.

If your program is all C++, then the solution is simple: don't break bool values this way, avoid messing with their bit representation and everything will go well; in particular, even if you assign from an integer to a bool the compiler will emit the necessary code to make sure that the resulting value is a valid bool, so your bool T = 3 is indeed safe, and T will end up with a true in its guts.

If instead you need to interoperate with code written in other languages that may not share the same idea of what a bool is, just avoid bool for "boundary" code, and marshal it as an appropriately-sized integer. It will work in conditionals & co. just as fine.

Update about the Fortran/interoperability side of the issue

Disclaimer all I know of Fortran is what I read this morning on standard documents, and that I have some punched cards with Fortran listings that I use as bookmarks, so go easy on me.

First of all, this kind of language interoperability stuff isn't part of the language standards, but of the platform ABI. As we are talking about Linux x86-64, the relevant document is the System V x86-64 ABI.

First of all, nowhere is specified that the C _Bool type (which is defined to be the same as C++ bool at 3.1.2 note †) has any kind of compatibility with Fortran LOGICAL; in particular, at 9.2.2 table 9.2 specifies that "plain" LOGICAL is mapped to signed int. About TYPE*N types it says that

The “TYPE*N” notation specifies that variables or aggregate members of type TYPE shall occupy N bytes of storage.

(ibid.)

There's no equivalent type explicitly specified for LOGICAL*1, and it's understandable: it's not even standard; indeed if you try to compile a Fortran program containing a LOGICAL*1 in Fortran 95 compliant mode you get warnings about it, both by ifort

./example.f90(2): warning #6916: Fortran 95 does not allow this length specification.   [1]

    logical*1, intent(in) :: x

------------^

and by gfort

./example.f90:2:13:
     logical*1, intent(in) :: x
             1
Error: GNU Extension: Nonstandard type declaration LOGICAL*1 at (1)

so the waters are already muddled; so, combining the two rules above, I'd go for signed char to be safe.

However: the ABI also specifies:

The values for type LOGICAL are .TRUE. implemented as 1 and .FALSE. implemented as 0.

So, if you have a program that stores anything besides 1 and 0 in a LOGICAL value, you are already out of spec on the Fortran side! You say:

A fortran logical*1 has same representation as bool, but in fortran if bits are 00000011 it is true, in C++ it is undefined.

This last statement is not true, the Fortran standard is representation-agnostic, and the ABI explicitly says the contrary. Indeed you can see this in action easily by checking the output of gfort for LOGICAL comparison:

integer function logical_compare(x, y)
    logical, intent(in) :: x
    logical, intent(in) :: y
    if (x .eqv. y) then
        logical_compare = 12
    else
        logical_compare = 24
    end if
end function logical_compare

becomes

logical_compare_:
        mov     eax, DWORD PTR [rsi]
        mov     edx, 24
        cmp     DWORD PTR [rdi], eax
        mov     eax, 12
        cmovne  eax, edx
        ret

You'll notice that there's a straight cmp between the two values, without normalizing them first (unlike ifort, that is more conservative in this regard).

Even more interesting: regardless of what the ABI says, ifort by default uses a nonstandard representation for LOGICAL; this is explained in the -fpscomp logicals switch documentation, which also specifies some interesting details about LOGICAL and cross-language compatibility:

Specifies that integers with a non-zero value are treated as true, integers with a zero value are treated as false. The literal constant .TRUE. has an integer value of 1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Intel Fortran releases before Version 8.0 and by Fortran PowerStation.

The default is fpscomp nologicals, which specifies that odd integer values (low bit one) are treated as true and even integer values (low bit zero) are treated as false.

The literal constant .TRUE. has an integer value of -1, and the literal constant .FALSE. has an integer value of 0. This representation is used by Compaq Visual Fortran. The internal representation of LOGICAL values is not specified by the Fortran standard. Programs which use integer values in LOGICAL contexts, or which pass LOGICAL values to procedures written in other languages, are non-portable and may not execute correctly. Intel recommends that you avoid coding practices that depend on the internal representation of LOGICAL values.

(emphasis added)

Now, the internal representation of a LOGICAL normally shouldn't a problem, as, from what I gather, if you play "by the rules" and don't cross language boundaries you aren't going to notice. For a standard compliant program there's no "straight conversion" between INTEGER and LOGICAL; the only way I see you can shove an INTEGER into a LOGICAL seem to be TRANSFER, which is intrinsically non-portable and give no real guarantees, or the non-standard INTEGER <-> LOGICAL conversion on assignment.

The latter one is documented by gfort to always result in nonzero -> .TRUE., zero -> .FALSE., and you can see that in all cases code is generated to make this happen (even though it's convoluted code in case of ifort with the legacy representation), so you cannot seem to shove an arbitrary integer into a LOGICAL in this way.

logical*1 function integer_to_logical(x)
    integer, intent(in) :: x
    integer_to_logical = x
    return
end function integer_to_logical

integer_to_logical_:
        mov     eax, DWORD PTR [rdi]
        test    eax, eax
        setne   al
        ret

The reverse conversion for a LOGICAL*1 is a straight integer zero-extension (gfort), so, to be honoring the contract in the documentation linked above, it's clearly expecting the LOGICAL value to be 0 or 1.

But in general, the situation for these conversions is a bit of a mess, so I'd just stay away from them.

So, long story short: avoid putting INTEGER data into LOGICAL values, as it is bad even in Fortran, and make sure to use the correct compiler flag to get the ABI-compliant representation for booleans, and interoperability with C/C++ should be fine. But to be extra safe, I'd just use plain char on the C++ side.

Finally, from what I gather from the documentation, in ifort there is some builtin support for interoperability with C, including booleans; you may try to leverage it.

answered Oct 24 '22 07:10

Matteo Italia

This is what happens when you violate your contract with both the language and the compiler.

You probably heard somewhere that "zero is false", and "non-zero is true". That holds when you stick to the language's parameters, statically converting an int to bool or vice versa.

It does not hold when you start messing with bit representations. In that case, you break your contract, and enter the realm of (at the very least) implementation-defined behaviour.

Simply don't do that.

It's not up to you how a bool is stored in memory. It's up to the compiler. If you want to change a bool's value, either assign true/false, or assign an integer and use the proper conversion mechanisms provided by C++.

The C++ standard used to actually give a specific call-out to how using bool in this manner is naughty and bad and evil ("Using a bool value in ways described by this document as 'undefined',such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false."), though it was removed in C++20 for editorial reasons.

answered Oct 24 '22 08:10

Lightness Races in Orbit

Related questions
                            
                                Name lookups in C++ templates
                            
                                Omitting the datatype (e.g. "unsigned" instead of "unsigned int")
                            
                                boost scoped_lock vs plain lock/unlock
                            
                                OSX - replace gcc version 4.2.1 with 4.9 installed via Homebrew
                            
                                Class template for numeric types
                            
                                What function does C++ write and call in an empty class?
                            
                                What is predicate in C++? [closed]
                            
                                How to use CMAKE_EXPORT_COMPILE_COMMANDS?
                            
                                Is there a way to iterate over at most N elements using range-based for loop?
                            
                                What is different between join() and detach() for multi threading in C++?
                            
                                Where can I set path to make.exe on Windows?
                            
                                C++ delete syntax
                            
                                ‘setprecision’ is not a member of ‘std’
                            
                                Converting a Json::Value to std::string?
                            
                                2D array values C++
                            
                                is there an iterator across unique keys in a std::multimap?
                            
                                How can I specialize a C++ template for a range of integer values?
                            
                                int_least64_t vs int_fast64_t vs int64_t
                            
                                Simple way to unzip a .zip file using zlib [duplicate]
                            
                                clear data inside text file in c++

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With