Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the significance of special language in standard for lvalue-to-rvalue conversions for unsigned character types of indeterminate value

In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):

4.1 Lvalue-to-rvalue-conversion [conv.lval]

  1. A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise the type of the prvalue is T.

  2. When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:

    • If T is a (possibly cv-qualified) std::nullptr_t then the result is a null pointer constant.
    • Otherwise, if T has class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.
    • Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
    • Otherwise, if T is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value.
    • Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.
    • Otherwise, the object indicated by the glvalue is the prvalue result.
  3. [Note: See also 3.10]

What's the significance of this paragraph (in bold)?

If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that

  • If I'm not actually accessing the character value, i.e. I'm immediately passing it to & or binding it to a reference, or
  • If the unsigned char does not have automatic storage duration,

then the conversion yields an unspecified value, and not undefined behavior.

Am I correct to conclude that this program:

#include <new>
#include <iostream>

// using T = int;
using T = unsigned char;

int main() {
  T * array = new T[500];
  for (int i = 0; i < 500; ++i) {
    std::cout << static_cast<int>(array[i]) << std::endl;
  }
  delete[] array;
}

is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int, would have undefined behavior?


IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char or arrays of unsigned char.

Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.

like image 508
Chris Beck Avatar asked Sep 06 '17 01:09

Chris Beck


1 Answers

In many situations, code will write some parts of a PODS or array without writing everything, and then use functions like memcpy or fwrite to copy or write the entire thing without regard for which parts had assigned values and which did not. Although it is not terribly common for C++ code to use byte-based operations to copy or write out the contents of aggregates, the ability to do so is a fundamental part of the language. Requiring that a program write definite values to all portions of an object, including those nothing will ever "care" about, would needlessly impair efficiency.

like image 77
supercat Avatar answered Nov 06 '22 18:11

supercat