Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining Undefined Behavior

Does there exist any implementation of C++ (and/or C) that guarantees that anytime undefined behavior is invoked, it will signal an error? Obviously, such an implementation could not be as efficient as a standard C++ implementation, but it could be a useful debugging/testing tool.

If such an implementation does not exist, then are there any practical reasons that would make it impossible to implement? Or is it just that no one has done the work to implement it yet?

Edit: To make this a little more precise: I would like to have a compiler that allows me to make the assertion, for a given run of a C++ program that ran to completion, that no part of that run involved undefined behavior.

like image 712
Edward Loper Avatar asked Dec 28 '12 17:12

Edward Loper


People also ask

What is undefined behavior in programming?

When we run a code, sometimes we see absurd results instead of expected output. So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.

Why does undefined behavior exist?

Undefined behavior exists mainly to give the compiler freedom to optimize. One thing it allows the compiler to do, for example, is to operate under the assumption that certain things can't happen (without having to first prove that they can't happen, which would often be very difficult or impossible).

Is unspecified behavior undefined behavior?

Unspecified behavior is different from undefined behavior. The latter is typically a result of an erroneous program construct or data, and no requirements are placed on the translation or execution of such constructs.

What type of behavior C is undefined?

According to the C standards, signed integer overflow is undefined behaviour too. A few compilers may trap the overflow condition when compiled with some trap handling options, while a few compilers simply ignore the overflow conditions (assuming that the overflow will never happen) and generate the code accordingly.


2 Answers

Yes, and no.

I am fairly certain that for practical purposes, an implementation could make C++ a safe language, meaning every operation has well-defined behavior. Of course, this comes at a huge overhead and there is probably some cases where it's simply unfeasible, such as race conditions in multithreaded code.

Now, the problem is that this can't guarantee your code is defined in other implementations! That is, it could still invoke UB. For instance, observe the following code:

int a;
int* b;

int foo() {
  a = 5;
  b = &a;
  return 0;
}

int bar() {
  *b = a;
  return 0;
}

int main() {
  std::cout << foo() << bar() << std::endl;
}

According to the standard, the order that foo and bar are called is up to the implementation to decide. Now, in a safe implementation this order would have to be defined, likely being left-to-right evaluation. The problem is that evaluating right-to-left invokes UB, which wouldn't be caught until you ran it on an unsafe implementation. The safe implementation could simply compile each permutation of evaluation order or do some static analysis, but this quickly becomes unfeasible and possibly undecidable.

So in conclusion, if such an implementation existed it would give you a false sense of security.

like image 155
Pubby Avatar answered Sep 28 '22 08:09

Pubby


The new C standard has an interesting list in the new Annex L with the crude title "Analyzability". It talks about UB that is so-called critical UB. This includes among others:

  • An object is referred to outside of its lifetime (6.2.4).
  • A pointer is used to call a function whose type is not compatible with the referenced type
  • The program attempts to modify a string literal

All of these are UB that are impossible or very hard to capture, since they usually can't be completely tested at compile time. This is due to the fact that a valid C (or C++) program is composed of several compilation units that may not know much of each other. E.g if one program passes a pointer to a string literal into a function with a char* parameter, or even worse, a program that casts away const-ness from a static variable.

like image 23
Jens Gustedt Avatar answered Sep 28 '22 09:09

Jens Gustedt