Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it undefined behaviour to allocate overlarge stack structures?

Tags:

This is a C specification question.

We all know this is legal C and should work fine on any platform:

/* Stupid way to count the length of a number */
int count_len(int val) {
    char buf[256];
    return sprintf(buf, "%d", val);
}

But this is almost guaranteed to crash:

/* Stupid way to count the length of a number */
int count_len(int val) {
    char buf[256000000];
    return sprintf(buf, "%d", val);
}

The difference is that the latter program blows the stack and will probably crash. But, purely semantically, it really isn't any different than the previous program.

According to the C spec, is the latter program actually undefined behavior? If so, what distinguishes it from the former? If not, what in the C spec says it's OK for a conforming implementation to crash?

(If this differs between C89/C99/C11/C++*, this would be interesting too).

like image 872
nneonneo Avatar asked Apr 17 '14 19:04

nneonneo


3 Answers

Language standards for C(89,99,11) begin with a scope section with this wording (also found in some C++, C#, Fortran and Pascal standards):

This International Standard does not specify

  • the size or complexity of a program and its data that will exceed the capacity of any specific data-processing system or the capacity of a particular processor;
  • all minimal requirements of a data-processing system that is capable of supporting a conforming implementation.

The gcc compiler does offer an option to check for stack overflow at runtime

21.1 Stack Overflow Checking

For most operating systems, gcc does not perform stack overflow checking by default. This means that if the main environment task or some other task exceeds the available stack space, then unpredictable behavior will occur. Most native systems offer some level of protection by adding a guard page at the end of each task stack. This mechanism is usually not enough for dealing properly with stack overflow situations because a large local variable could “jump” above the guard page. Furthermore, when the guard page is hit, there may not be any space left on the stack for executing the exception propagation code. Enabling stack checking avoids such situations. To activate stack checking, compile all units with the gcc option -fstack-check. For example:

gcc -c -fstack-check package1.adb 

Units compiled with this option will generate extra instructions to check that any use of the stack (for procedure calls or for declaring local variables in declare blocks) does not exceed the available stack space. If the space is exceeded, then a Storage_Error exception is raised.

There was an attempt during the standardization process for C99 to make a stronger statement within the standard that while size and complexity are beyond the scope of the standard the implementer has a responsibility to document the limits.

The rationale was

The definition of conformance has always been a problem with the C Standard, being described by one author as "not even rubber teeth, more like rubber gums". Though there are improvements in C9X compared with C89, many of the issues still remain.

This paper proposes changes which, while not perfect, hopefully improve the situation.

The following wording was suggested for inclusion to section 5.2.4.1

  • translation or execution might fail if the size or complexity of a program or its data exceeds the capacity of the implementation.
  • The implementation shall document a way to determine if the size or complexity of a correct program exceeds or might exceed the capacity of the implementation.

    5.2.4.1. An implementation is always free to state that a given program is too large or too complex to be translated or executed. However, to stop this being a way to claim conformance while providing no useful facilities whatsoever, the implementer must show provide a way to determine whether a program is likely to exceed the limits. The method need not be perfect, so long as it errs on the side of caution. One way to do this would be to have a formula which converted values such as the number of variables into, say, the amount of memory the compiler would need. Similarly, if there is a limit on stack space, the formula need only show how to determine the stack requirements for each function call (assuming this is the only place the stack is allocated) and need not work through every possible execution path (which would be impossible in the face of recursion). The compiler could even have a mode which output a value for each function in the program.

The proposed wording did not make it into the C99 standard, and therefore this area remained outside the scope of the standard. Section 5.2.4.1 of C99 does list these limits

The implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following limits:

  • 127 nesting levels of blocks
  • 63 nesting levels of conditional inclusion
  • 12 pointer, array, and function declarators (in any combinations) modifying an arithmetic, structure, union, or incomplete type in a declaration
  • 63 nesting levels of parenthesized declarators within a full declarator
  • 63 nesting levels of parenthesized expressions within a full expression
  • 63 significant initial characters in an internal identifier or a macro name (each universal character name or extended source character is considered a single character)
  • 31 significant initial characters in an external identifier (each universal character name specifying a short identifier of 0000FFFF or less is considered 6 characters, each universal character name specifying a short identifier of 00010000 or more is considered 10 characters, and each extended source character is considered the same number of characters as the corresponding universal character name, if any)
  • 4095 external identifiers in one translation unit
  • 511 identifiers with block scope declared in one block
  • 4095 macro identifiers simultaneously defined in one preprocessing translation unit
  • 127 parameters in one function definition
  • 127 arguments in one function call
  • 127 parameters in one macro definition
  • 127 arguments in one macro invocation
  • 4095 characters in a logical source line
  • 4095 characters in a character string literal or wide string literal (after concatenation)
  • 65535 bytes in an object (in a hosted environment only)
  • 15 nesting levels for #included files
  • 1023 case labels for a switch statement (excluding those for any nested switch statements)
  • 1023 members in a single structure or union
  • 1023 enumeration constants in a single enumeration
  • 63 levels of nested structure or union definitions in a single struct-declaration-list
like image 122
amdn Avatar answered Oct 26 '22 02:10

amdn


In C++, Annex B indicates that the maximum size of an object is an implementation-specific finite number. That would tend to limit arrays with automatic storage class.

However, I'm not seeing something specifically for space accumulated by all automatic variables on the call stack, which is where a stack overflow ought to be triggered. I'm also not seeing a recursion limit in Annex B, although that would be closely related.

like image 26
Ben Voigt Avatar answered Oct 26 '22 01:10

Ben Voigt


The C standard is silent on all issues relating to stack overflow. This is a bit strange since it's very vocal in just about every other corner of C programming. AFAIK there is no specification that a certain amount of automatic storage must be available, and no way of detecting or recovering from exhaustion of the space available for automatic storage. The abstract machine is assumed to have an unlimited amount of automatic storage.

like image 20
M.M Avatar answered Oct 26 '22 01:10

M.M