Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading an indeterminate value invokes UB? [duplicate]

Various esteemed, high rep users on SO keeps insisting that reading a variable with indeterminate value "is always UB". So where exactly is this mentioned in the C standard?

It is very clear that an indeterminate value could either be an unspecified value or a trap representation:

3.19.2
indeterminate value
either an unspecified value or a trap representation

3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.

3.19.4
trap representation
an object representation that need not represent a value of the object type

It is also clear that reading a trap representation invokes undefined behavior, 6.2.6.1:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

However, an indeterminate value does not necessarily contain a trap representation. In fact, trap representations are very rare for systems using two's complement.

Where in the C standard does it actually say that reading an indeterminate value invokes undefined behavior?

I was reading the non-normative Annex J of C11 and found that this is indeed listed as one case of UB:

The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).

However, the listed sections are irrelevant. 6.2.4 only states rules regarding life time and when a variable's value becomes indeterminate. Similarly, 6.7.9 is regarding initialization and states how a variable's value becomes indeterminate. 6.8 seems mostly irrelevant. None of the sections contains any normative text saying that accessing an indeterminate value can lead to UB. Is this a defect in Annex J?

There is however some relevant, normative text in 6.3.2.1 regarding lvalues:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

But that is a special case, which only applies to variables of automatic storage duration that never had their address taken. I have always thought that this section of 6.3.2.1 is the only case of UB regarding indeterminate values (that are not trap representations). But people keep insisting that "it is always UB". So where exactly is this mentioned?

like image 580
Lundin Avatar asked Nov 19 '22 12:11

Lundin


1 Answers

As far as I know, there is nothing in the standard that says that using an indeterminate value is always undefined behavior.

The cases that are spelled out as invoking undefined behavior are:

  • If the value happens to be a trap representation.
  • If the indeterminate value is an object of automatic storage.
  • If the value is a pointer to an object whose lifetime has ended.

As an example, the C standard specifies that the type unsigned char has no padding bits and therefore none of its values can ever be a trap representation.

Portable implementations of functions such as memcpy take advantage of this fact to perform a copy of any value, including indeterminate values. Those values could potentially be trap representations when used as values of a type that contains padding bits, but they are simply unspecified when used as values of unsigned char.


I believe that it is erroneous to assume that if something could invoke undefined behavior then it does invoke undefined behavior when the program has no safe way of checking. Consider the following example:

int read(int* array, int n, int i)
{       
   if (0 <= i)
       if (i < n)
           return array[i];
   return 0;
}

In this case, the read function has no safe way of checking whether array really is of (at least) length n. Clearly, if the compiler considered these possible UB operations as definite UB, it would be nearly impossible to write any pointer code.

More generally, if the compiler cannot prove that something is UB, it has to assume that it isn't UB, otherwise it risks breaking conforming programs.


The only case where the possibility is treated like a certainty, is the case of objects of automatic storage. I think it's reasonable to assume that the reason for that is because those cases can be statically rejected, since all the information the compiler needs can be obtained through local flow analysis.

On the other hand, declaring it as UB for non-automatic storage objects would not give the compiler any useful information in terms of optimizations or portability (in the general case). Thus, the standard probably doesn't mention those cases because it wouldn't change anything in realistic implementations anyway.

like image 173
Theodoros Chatzigiannakis Avatar answered Nov 22 '22 01:11

Theodoros Chatzigiannakis