Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does initialization entail lvalue-to-rvalue conversion? Is `int x = x;` UB?

The C++ standard contains a semi-famous example of "surprising" name lookup in 3.3.2, "Point of declaration":

int x = x; 

This initializes x with itself, which (being a primitive type) is uninitialized and thus has an indeterminate value (assuming it is an automatic variable).

Is this actually undefined behaviour?

According to 4.1 "Lvalue-to-rvalue conversion", it is undefined behaviour to perform lvalue-to-rvalue conversion on an uninitialized value. Does the right-hand x undergo this conversion? If so, would the example actually have undefined behaviour?

like image 663
Kerrek SB Avatar asked Feb 18 '13 11:02

Kerrek SB


2 Answers

UPDATE: Following the discussion in the comments, I added some more evidence at the end of this answer.


Disclaimer: I admit this answer is rather speculative. The current formulation of the C++11 Standard, on the other hand, does not seem to allow for a more formal answer.


In the context of this Q&A, it has emerged that the C++11 Standard fails to formally specify what value categories are expected by each language construct. In the following I will mostly focus on built-in operators, although the question is about initializers. Eventually, I will end up extending the conclusions I drew for the case of operators to the case of initializers.

In the case of built-in operators, in spite of the lack of a formal specification, (non-normative) evidences are found in the Standard that the intended specification is to let prvalues be expected wherever a value is needed, and when not specified otherwise.

For instance, a note in Paragraph 3.10/1 says:

The discussion of each built-in operator in Clause 5 indicates the category of the value it yields and the value categories of the operands it expects. For example, the built-in assignment operators expect that the left operand is an lvalue and that the right operand is a prvalue and yield an lvalue as the result. User-defined operators are functions, and the categories of values they expect and yield are determined by their parameter and return types

Section 5.17 on assignment operators, on the other hand, does not mention this. However, the possibility of performing an lvalue-to-rvalue conversion is mentioned, again in a note (Paragraph 5.17/1):

Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single compound assignment operator

Of course, if no rvalue were expected, this note would be meaningless.

Another evidence is found in 4/8, as pointed out by Johannes Schaub in the comments to linked Q&A:

There are some contexts where certain conversions are suppressed. For example, the lvalue-to-rvalue conversion is not done on the operand of the unary & operator. Specific exceptions are given in the descriptions of those operators and contexts.

This seems to imply that lvalue-to-rvalue conversion is performed on all operands of built-in operators, except when specified otherwise. This would mean, in turn, that rvalues are expected as operands of built-in operators unless specified otherwise.


CONJECTURE:

Even though initialization is not assignment, and therefore operators do not enter the discussion, my suspicion is that this area of the specification is affected by the very same problem described above.

Traces supporting this belief can be found even in Paragraph 8.5.2/5, about the initialization of references (for which the value of the lvalue initializer expression is not needed):

The usual lvalue-to-rvalue (4.1), array-to-pointer (4.2), and function-to-pointer (4.3) standard conversions are not needed, and therefore are suppressed, when such direct bindings to lvalues are done.

The word "usual" seems to imply that when initializing objects which are not of a reference type, lvalue-to-rvalue conversion is meant to apply.

Therefore, I believe that although requirements on the expected value category of initializers are ill-specified (if not completely missing), on the grounds of the evidences provided it makes sense to assume that the intended specification is that:

Wherever a value is required by a language construct, a prvalue is expected unless specified otherwise.

Under this assumption, an lvalue-to-rvalue conversion would be required in your example, and that would lead to Undefined Behavior.


ADDITIONAL EVIDENCE:

Just to provide further evidence to support this conjecture, let's assume it wrong, so that no lvalue-to-rvalue conversion is indeed required for copy-initialization, and consider the following code (thanks to jogojapan for contributing):

int y; int x = y; // No UB short t; int u = t; // UB! (Do not like this non-uniformity, but could accept it) int z; z = x; // No UB (x is not uninitialized) z = y; // UB! (Assuming assignment operators expect a prvalue, see above)        // This would be very counterintuitive, since x == y 

This non-uniform behavior does not make a lot of sense to me. What makes more sense IMO is that wherever a value is required, a prvalue is expected.

Moreover, as Jesse Good correctly points out in his answer, the key Paragraph of the C++ Standard is 8.5/16:

— Otherwise, the initial value of the object being initialized is the (possibly converted) value of the initializer expression. Standard conversions (Clause 4) will be used, if necessary, to convert the initializer expression to the cv-unqualified version of the destination type; no user-defined conversions are considered. If the conversion cannot be done, the initialization is ill-formed. [ Note: An expression of type “cv1 T” can initialize an object of type “cv2 T” independently of the cv-qualifiers cv1 and cv2.

However, while Jesse mainly focuses on the "if necessary" bit, I would also like to stress the word "type". The paragraph above mentions that standard conversions will be used "if necessary" to convert to the destination type, but does not say anything about category conversions:

  1. Will category conversions be performed if needed?
  2. Are they needed?

For what concerns the second question, as discussed in the original part of the answer, the C++11 Standard currently does not specify whether category conversions are needed or not, because nowhere it is mentioned whether copy-initialization expects a prvalue as an initializer. Thus, a clear-cut answer is impossible to give. However, I believe I provided enough evidence to assume this to be the intended specification, so that the answer would be "Yes".

As for the first question, it seems reasonable to me that the answer is "Yes" as well. If it were "No", obviously correct programs would be ill-formed:

int y = 0; int x = y; // y is lvalue, prvalue expected (assuming the conjecture is correct) 

To sum it up (A1 = "Answer to question 1", A2 = "Answer to question 2"):

          | A2 = Yes   | A2 = No |  ---------|------------|---------|  A1 = Yes |     UB     |  No UB  |   A1 = No  | ill-formed |  No UB  |  --------------------------------- 

If A2 is "No", A1 does not matter: there's no UB, but the bizarre situations of the first example (e.g. z = y giving UB, but not z = x even though x == y) show up. If A2 is "Yes", on the other hand, A1 becomes crucial; yet, enough evidence has been given to prove it would be "Yes".

Therefore, my thesis is that A1 = "Yes" and A2 = "Yes", and we should have Undefined Behavior.


FURTHER EVIDENCE:

This defect report (courtesy of Jesse Good) proposes a change that is aimed at giving Undefined Behavior in this case:

[...] In addition, 4.1 [conv.lval] paragraph 1 says that applying the lvalue-to-rvalue conversion to an “object [that] is uninitialized” results in undefined behavior; this should be rephrased in terms of an object with an indeterminate value.

In particular, the proposed wording for Paragraph 4.1 says:

When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5 [expr]) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:

— If T is (possibly cv-qualified) std::nullptr_t, the result is a null pointer constant (4.10 [conv.ptr]).

— Otherwise, if the glvalue T has a class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.

— Otherwise, if the object to which the glvalue refers contains an invalid pointer value (3.7.4.2 [basic.stc.dynamic.deallocation], 3.7.4.3 [basic.stc.dynamic.safety]), the behavior is implementation-defined.

— Otherwise, if T is a (possibly cv-qualified) unsigned character type (3.9.1 [basic.fundamental]), and the object to which the glvalue refers contains an indeterminate value (5.3.4 [expr.new], 8.5 [dcl.init], 12.6.2 [class.base.init]), and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value. [Footnote: The value may be different each time the lvalue-to-rvalue conversion is applied to the object. An unsigned char object with indeterminate value allocated to a register might trap. —end footnote]

Otherwise, if the object to which the glvalue refers contains an indeterminate value, the behavior is undefined.

— Otherwise, if the glvalue has (possibly cv-qualified) type std::nullptr_t, the prvalue result is a null pointer constant (4.10 [conv.ptr]). Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.

like image 184
13 revs Avatar answered Oct 01 '22 02:10

13 revs


An implicit conversion sequence of an expression e to type T is defined as being equivalent to the following declaration, using t as the result of the conversion (modulo value category, which will be defined depending on T), 4p3 and 4p6

T t = e; 

The effect of any implicit conversion is the same as performing the corresponding declaration and initialization and then using the temporary variable as the result of the conversion.

In clause 4, the conversion of an expression to a type always yields expressions with a specific property. For example, conversion of 0 to int* yields a null pointer value, and not just one arbitrary pointer value. The value category too is a specific property of an expression and its result is defined as follows

The result is an lvalue if T is an lvalue reference type or an rvalue reference to function type (8.3.2), an xvalue if T is an rvalue reference to object type, and a prvalue otherwise.

Hence we know that in int t = e;, the result of the conversion sequence is a prvalue, because int is a non-reference type. So if we provide a glvalue, we are in obvious need of a conversion. 3.10p2 further clarifies that to leave no doubt

Whenever a glvalue appears in a context where a prvalue is expected, the glvalue is converted to a prvalue; see 4.1, 4.2, and 4.3.

like image 45
Johannes Schaub - litb Avatar answered Oct 01 '22 00:10

Johannes Schaub - litb