Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is casting from char to std::byte potentially undefined behavior?

The std::byte of C++17 is required to be enum class:

enum class byte : unsigned char {};

We may want to use that std::byte to represent raw memory instead of one of chars since it is more type-safe, has its byte-specific operators defined and can't promote to int out of blue like chars do. We need to use explicit casts or to_integer to convert std::byte to other integers. However from lot of sources we still get char (or more likely whole buffers of char) and so may want to convert it:

void fn(char c)
{
    std::byte b = static_cast<std::byte>(c);
    // ... that may invoke undefined behavior, read below
}

The signedness of char is implementation-defined so std::numeric_limits<char>::is_signed may be true. Therefore above c may have negative values that are outside of range of unsigned char.

Now in C++17 standard in 8.2.9 Static cast [expr.static.cast] paragraph 10 we can read that:

A value of integral or enumeration type can be explicitly converted to a complete enumeration type. The value is unchanged if the original value is within the range of the enumeration values (10.2). Otherwise, the behavior is undefined.

And from 10.2 we can see that the mentioned range is range of underlying type. Therefore to avoid undefined behavior we have to write more code. For example we can add a cast to unsigned char to achieve defined effects of modular arithmetic during cast:

void fn(char c)
{
    std::byte b = static_cast<std::byte>(static_cast<unsigned char>(c));
    // ... now we have done it in portable manner?
}

Did I misunderstand something? Isn't that over-abundantly complicated and restrictive? Why can't the enum class that has unsigned underlying type follow modular arithmetic like its underlying type does? Note that the whole row of casts is most likely compiled into nothing by compiler anyway. The char when it is signed has to be two's complement since C++14 and so its bitwise representation has to be same as after modular arithmetic conversion to unsigned char. Who benefits from that formal undefined behavior and how?

like image 412
Öö Tiib Avatar asked Sep 28 '18 11:09

Öö Tiib


1 Answers

This is going to be fixed in the next standard:

A value of integral or enumeration type can be explicitly converted to a complete enumeration type. If the enumeration type has a fixed underlying type, the value is first converted to that type by integral conversion, if necessary, and then to the enumeration type. If the enumeration type does not have a fixed underlying type, the value is unchanged if the original value is within the range of the enumeration values ([dcl.enum]), and otherwise, the behavior is undefined

Here's the rationale behind the change from (C++11) unspecified to (C++17) undefined: 

Although issue 1094 clarified that the value of an expression of enumeration type might not be within the range of the values of the enumeration after a conversion to the enumeration type (see 8.2.9 [expr.static.cast] paragraph 10), the result is simply an unspecified value. This should probably be strengthened to produce undefined behavior, in light of the fact that undefined behavior makes an expression non-constant.

And here's the rationale behind the C++2a fix:

The specifications of std::byte (21.2.5 [support.types.byteops]) and bitmask (20.4.2.1.4 [bitmask.types]) have revealed a problem with the integral conversion rules, according to which both those specifications have, in the general case, undefined behavior. The problem is that a conversion to an enumeration type has undefined behavior unless the value to be converted is in the range of the enumeration.

For enumerations with an unsigned fixed underlying type, this requirement is overly restrictive, since converting a large value to an unsigned integer type is well-defined.

like image 91
geza Avatar answered Nov 14 '22 11:11

geza