Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can code that will never be executed invoke undefined behavior?

The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?

int main(void) {     int i;     if(0)     {         i = 1/0;     }     return 0; } 

I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.

So, any ideas?

like image 797
Yu Hao Avatar asked Aug 22 '13 15:08

Yu Hao


People also ask

What happens undefined behavior?

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres.

What is undefined behavior in programming?

In computer programming, undefined behaviour is defined as 'the result of compiling computer code which is not prescribed by the specs of the programming language in which it is written'.

What causes undefined Behaviour in C?

So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.

Why does C++ allow undefined behavior?

C and C++ have undefined behavior, because nobody's defined an acceptable alternative that allows them to do what they're intended to do. C# and Java take a different approach, but that approach fits poorly (if at all) with the goals of C and C++.


1 Answers

Let's look at how the C standard defines the terms "behavior" and "undefined behavior".

References are to the N1570 draft of the ISO C 2011 standard; I'm not aware of any relevant differences in any of the three published ISO C standards (1990, 1999, and 2011).

Section 3.4:

behavior
external appearance or action

Ok, that's a bit vague, but I'd argue that a given statement has no "appearance", and certainly no "action", unless it's actually executed.

Section 3.4.3:

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

It says "upon use" of such a construct. The word "use" is not defined by the standard, so we fall back to the common English meaning. A construct is not "used" if it's never executed.

There's a note under that definition:

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

So a compiler is permitted to reject your program at compile time if its behavior is undefined. But my interpretation of that is that it can do so only if it can prove that every execution of the program will encounter undefined behavior. Which implies, I think, that this:

if (rand() % 2 == 0) {     i = i / 0; } 

which certainly can have undefined behavior, cannot be rejected at compile time.

As a practical matter, programs have to be able to perform runtime tests to guard against invoking undefined behavior, and the standard has to permit them to do so.

Your example was:

if (0) {     i = 1/0; } 

which never executes the division by 0. A very common idiom is:

int x, y; /* set values for x and y */ if (y != 0) {     x = x / y; } 

The division certainly has undefined behavior if y == 0, but it's never executed if y == 0. The behavior is well defined, and for the same reason that your example is well defined: because the potential undefined behavior can never actually happen.

(Unless INT_MIN < -INT_MAX && x == INT_MIN && y == -1 (yes, integer division can overflow), but that's a separate issue.)

In a comment (since deleted), somebody pointed out that the compiler may evaluate constant expressions at compile time. Which is true, but not relevant in this case, because in the context of

i = 1/0; 

1/0 is not a constant expression.

A constant-expression is a syntactic category that reduces to conditional-expression (which excludes assignments and comma expressions). The production constant-expression appears in the grammar only in contexts that actually require a constant expression, such as case labels. So if you write:

switch (...) {     case 1/0:     ... } 

then 1/0 is a constant expression -- and one that violates the constraint in 6.6p4: "Each constant expression shall evaluate to a constant that is in the range of representable values for its type.", so a diagnostic is required. But the right hand side of an assignment does not require a constant-expression, merely a conditional-expression, so the constraints on constant expressions don't apply. A compiler can evaluate any expression that it's able to at compile time, but only if the behavior is the same as if it were evaluated during execution (or, in the context of if (0), not evaluated during execution().

(Something that looks exactly like a constant-expression is not necessarily a constant-expression, just as, in x + y * z, the sequence x + y is not an additive-expression because of the context in which it appears.)

Which means the footnote in N1570 section 6.6 that I was going to cite:

Thus, in the following initialization,
static int i = 2 || 1 / 0;
the expression is a valid integer constant expression with value one.

isn't actually relevant to this question.

Finally, there are a few things that are defined to cause undefined behavior that aren't about what happens during execution. Annex J, section 2 of the C standard (again, see the N1570 draft) lists things that cause undefined behavior, gathered from the rest of the standard. Some examples (I don't claim this is an exhaustive list) are:

  • A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment
  • Token concatenation produces a character sequence matching the syntax of a universal character name
  • A character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token
  • An identifier, comment, string literal, character constant, or header name contains an invalid multibyte character or does not begin and end in the initial shift state
  • The same identifier has both internal and external linkage in the same translation unit

These particular cases are things that a compiler could detect. I think their behavior is undefined because the committee didn't want to, or couldn't, impose the same behavior on all implementations, and defining a range of permitted behaviors just wasn't worth the effort. They don't really fall into the category of "code that will never be executed", but I mention them here for completeness.

like image 57
Keith Thompson Avatar answered Sep 22 '22 05:09

Keith Thompson