In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):
4.1 Lvalue-to-rvalue-conversion [conv.lval]
A glvalue (3.10) of a non-function, non-array type
T
can be converted to a prvalue. IfT
is an incomplete type, a program that necessitates this conversion is ill-formed. IfT
is a non-class type, the type of the prvalue is the cv-unqualified version ofT
. Otherwise the type of the prvalue isT
.When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:
- If
T
is a (possibly cv-qualified)std::nullptr_t
then the result is a null pointer constant.- Otherwise, if
T
has class type, the conversion copy-initializes a temporary of typeT
from the glvalue and the result of the conversion is a prvalue for the temporary.- Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
- Otherwise, if
T
is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary&
operator or it was bound to a reference, the result is an unspecified value.- Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.
- Otherwise, the object indicated by the glvalue is the prvalue result.
- [Note: See also 3.10]
What's the significance of this paragraph (in bold)?
If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char
value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that
&
or binding it to a reference, orunsigned char
does not have automatic storage duration,then the conversion yields an unspecified value, and not undefined behavior.
Am I correct to conclude that this program:
#include <new>
#include <iostream>
// using T = int;
using T = unsigned char;
int main() {
T * array = new T[500];
for (int i = 0; i < 500; ++i) {
std::cout << static_cast<int>(array[i]) << std::endl;
}
delete[] array;
}
is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int
, would have undefined behavior?
IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char
or arrays of unsigned char
.
Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char
that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.
In many situations, code will write some parts of a PODS or array without writing everything, and then use functions like memcpy
or fwrite
to copy or write the entire thing without regard for which parts had assigned values and which did not. Although it is not terribly common for C++ code to use byte-based operations to copy or write out the contents of aggregates, the ability to do so is a fundamental part of the language. Requiring that a program write definite values to all portions of an object, including those nothing will ever "care" about, would needlessly impair efficiency.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With