Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is undefined behaviour allowed in C

I have been messing around trying to learn C lately. Coming from Java, it surprised me that you can perform certain operations declared as "undefined".

This just seems extremely unsafe to me. I understand it is the programmer's responsibility not to perform undefined operations, but why is it even allowed to start with? Why does the compiler not catch, for instance, array indices out of bounds, or even dangling pointers? You just end up accessing blocks of memory you never should access, with no (apparent) good reason.

As a comparison, Java makes extra sure you don't do anything stupid, throwing Exceptions around like hot cakes.

Surely there must be a reason why this is allowed? What is it?

ANSWER: To my understanding, the main reason is performance. Also, Java does have undefined behaviours, although not labeled as such.

EDIT: restricted question to C

like image 626
justbourv Avatar asked Apr 10 '16 00:04

justbourv


People also ask

Why does C allow undefined behavior?

It exists because of the syntax rules of C where a variable can be declared without init value. Some compilers assign 0 to such variables and some just assign a mem pointer to the variable and leave just like that. if program does not initialize these variables it leads to undefined behavior.

What type of behavior C is undefined?

According to the C standards, signed integer overflow is undefined behaviour too. A few compilers may trap the overflow condition when compiled with some trap handling options, while a few compilers simply ignore the overflow conditions (assuming that the overflow will never happen) and generate the code accordingly.

Is unspecified behavior undefined behavior?

Unspecified behavior is different from undefined behavior. The latter is typically a result of an erroneous program construct or data, and no requirements are placed on the translation or execution of such constructs.


2 Answers

Undefined behavior is not allowed, it's just not caught by the compiler.

The tradeoff here is between the speed and the safety. Many kinds of undefined behavior could be prevented at the expense of a few additional CPU cycles.

For example, you could prevent UB that happens when you read from memory that has been allocated but not initialized by having the compiled code write zeros into it. This, however, costs you a whole additional write into a memory, which is entirely unnecessary.

Similarly, one could prevent reading/writing past the end of an array by checking its bounds inside [] operator. However, this would cost you a few additional CPU cycles on each array access.

C++ designers decided that it is better to have speed and allow potential UB than to force everyone pay for what they do not need. This approach, however, is incompatible with Java's "write once, run anywhere" requirement, so designers of Java language insisted on fully defined behavior in nearly all situations.

like image 85
Sergey Kalinichenko Avatar answered Oct 12 '22 13:10

Sergey Kalinichenko


Originally, most forms of Undefined Behavior represented things which some implementations might trap, but other implementations might not. Because there was no way for the authors of the Standard to predict all the things a platform might do in case of a trap (including, literally, the possibility that a system would sound an alarm and lock up until an operator manually cleared the fault), the consequences of traps fell outside the jurisdiction of the C Standard, and thus almost every action for which some platform might conceivably cause a trap is--from the point of view of the Standard--considered "Undefined Behavior".

That should not be taken to imply that the authors of the Standard didn't believe implementations should try to behave sensibly for such things when practical. The authors of the C89 Standard noted, for example, that the majority of current systems of that era would define behavior for:

/* Assume USmall is half the size of "int" */
unsigned mult(USmall x, USmall y) { return x*y; }

which would in all cases, including those where the mathematical product of x and y was between INT_MAX+1 and UINT_MAX, be equivalent to (unsigned)x*y;. I see no reason to believe they wouldn't have expected that trend to continue.

Unfortunately, a new philosophy has become fashionable, based on the revisionist viewpoint that compiler writers only supported useful behaviors in cases not mandated by the Standard because they were too unsophisticated to do anything else. In gcc, for example, using optimization level 2 but no other non-default options, the above "mult" routine will sometimes generate bogus code in cases where the product would be between 0x80000000u and 0xFFFFFFFFu, even when running on platforms where such computations would historically have worked. This is supposedly being done in the name of "optimization"; it would be interesting to know how many of the "optimizations" such techniques end up performing are actually useful and could not have been achieved via safer means.

Historically, Undefined Behavior was a license for a C compiler to expose the behavior of the underlying platform; in cases where the underlying platform's behavior fit the programmer's needs, this allowed the programmer's requirements to be expressed in machine code more efficiently than if everything had to be done in ways defined by the Standard. Lately, however, it has been interpreted as license for compilers to implement behaviors which not only bear no relation to anything in the underlying platform nor to any plausible programmer expectations, but aren't even bound by laws of time and causality.

like image 27
supercat Avatar answered Oct 12 '22 15:10

supercat