Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does unsigned char have different default initialization behaviour than other data types?

I am reading through the cppreference page on default initialization and I noticed a section that states something along these lines:

//UB
int x;
int y = x;        
   
//Defined and ok
unsigned char c;
unsigned char d = c;

And the same rule for unsigned char, applys for std::byte aswell.

My question is why does every other non class variable (int, bool, char etc) result in UB if you try to use the value before assigning it (like above example), but not unsigned char? Why is unsigned char special?

The page I am reading for reference

like image 539
Weston McNamara Avatar asked Jul 02 '21 04:07

Weston McNamara


People also ask

Is char default signed or unsigned?

In the book "Complete Reference of C" it is mentioned that char is by default unsigned.

What is the purpose of unsigned char?

The rest part of the ASCII is known as extended ASCII. Using char or signed char we cannot store the extended ASCII values. By using the unsigned char, we can store the extended part as its range is 0 to 255.

What is the default value of unsigned char in C?

The default value of Char is the character with a code point of 0.

Is char the same as unsigned char?

A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short . An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short .


2 Answers

The difference is not in initialisation behaviour. The value of uninitialised int is indeterminate and default initialisation leaves it indeterminate. The value of uninitialised unsigned char is indeterminate and default initialisation leaves it indeterminate. There is no difference there.

The difference is that behaviour of producing an indeterminate value of type int - or any other type besides the exceptional unsigned char or std::byte - is undefined (unless the value is discarded).

The exception for unsigned char (and later std::byte) was added to the language in C++14 when indeterminate value was properly defined (although since the change was a defect resolution, to my understanding it applies to the official standard at the time, C++11).

I could not find a documented rationale for that design choice. Here is a timeline of the definitions (all standard quotes are from drafts):

C89 - 1.6 DEFINITIONS OF TERMS

Undefined behavior --- behavior, upon use of ... indeterminately-valued objects


C89 - 3.5.7 Initialization - Semantics

... If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.

There are no exceptions for any type. You'll see why C standard is relevant when reading C++98 standard.

C++98 - [dcl.init]

... Otherwise, if no initializer is specified for an object, the object and its subobjects, if any, have an indeterminate initial value

There was no definition for what indeterminate value means or what happens when you use it. The intended meaning may presumably have been same as C89, but it is underspecified.

C99 - 3. Terms, definitions, and symbols - 3.17.2

3.17.2 indeterminate value

either an unspecified value or a trap representation

3.17.3 unspecified value

valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

NOTE An unspecified value cannot be a trap representation.


C99 - 6.2.6 Representations of types - 6.2.6.1 General

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 41) Such a representation is called a trap representation.


C99 - J.2 Undefined behavior

The behavior is undefined in the following circumstances:

  • ...
  • The value of an object with automatic storage duration is used while it is indeterminate
  • A trap representation is read by an lvalue expression that does not have character type
  • A trap representation is produced by a side effect that modifies any part of the object using an lvalue expression that does not have character type
  • ...

C99 introduced the term trap representation, and which also have UB when used, just like indeterminate values. Character types (which are char, unsigned char and signed char) don't have trap representations, and may be used to operate on trap representations of other types without UB.

C++ core language issue - 616. Definition of “indeterminate value”

The C++ Standard uses the phrase “indeterminate value” without defining it. C99 defines it as “either an unspecified value or a trap representation.” Should C++ follow suit?

Proposed resolution (October, 2012):

[dcl.init] paragraph 12 as follows:

If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17 [expr.ass]). [Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2 [basic.start.init]. —end note] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

  • If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:
  • the second or third operand of a conditional expression (5.16 [expr.cond]),
  • the right operand of a comma (5.18 [expr.comma]),
  • the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9 [expr.static.cast], 5.4 [expr.cast]), or
  • a discarded-value expression (Clause 5 [expr]),

then the result of the operation is an indeterminate value.

If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right operand of a simple assignment operator (5.17 [expr.ass]) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.

If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.

The proposed change was accepted as a defect resolution with some further changes (issue 1213) but has remained mostly the same (similar enough for purposes of this question). This is where the exception for unsigned char seems to have been introduced into C++. The core language issue has no public comments or notes about the rationale for the exception as far as I could find.

like image 115
eerorika Avatar answered Sep 21 '22 09:09

eerorika


Under C89 and C99, uninitialized values could have any bit pattern. If addressable locations have n bits, then unsigned char was guaranteed to have 2ⁿ possible values, so every possible bit pattern would be a valid value. Other types, however, would on some platforms be stored in ways where not all bit patterns were valid. The Standard imposed no requirements on what might happen if code attempted to read an object when the stored bit pattern didn't represent a valid value, so the question of whether reading an object of a type other than unsigned char would yield an Unspecified Value, or could trigger arbitrary behavior, would depend upon whether the implementation's specified representation of the type assigned valid values to all possible bit patterns.

The C11 Standard added an additional proviso which says that even implementations which specify that all objects, whether or not their address is taken, will always be stored in ways were all bit patterns would represent valid values may opt to behave in completely arbitrary fashion if an attempt is made to access an uninitialized object that isn't an unsigned char whose address is taken. Although no rationale document is published for C11 (unlike earlier versions), I think such changes stem from a lack of consensus about whether the Standard is supposed to only describe the behavior of 100% portable programs, or of a wider variety of practical programs. If a program is going to be run on a completely unspecified implementation, then it will be impossible to know what the effect of reading an uninitialized object would be except in the case specified by the C11 Standard. If it's going to be run on a known implementation, then it will be processed however that implementation decides to process it, whether or not the Standard mandates the behavior, so there should be no need to mandate anything in particular. Unfortunately, the authors of a Gratuitously "Clever" Compiler believe that when the Standard characterizes an action as "non-portable or erroneous" what it really means is "non-portable, and therefore erroneous", and excludes the possibility of "non-portable but correct on the intended target", despite the fact that such a notion directly contradicts the published Rationale documents for earlier versions of the Standard.

like image 43
supercat Avatar answered Sep 19 '22 09:09

supercat