Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out of the bounds in C++ and undefined behaviour

I know that in c++ access out of buffer bounds is undefined behaviour.
Here is example from cppreference:

int table[4] = {};
bool exists_in_table(int v)
{
    // return true in one of the first 4 iterations or UB due to out-of-bounds access
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

But, I can't find according paragraph in c++ standard.
Can anyone point me out on concrete paragraph in standard where such case is explained?

like image 390
Andrew Moylan Avatar asked Jul 09 '21 12:07

Andrew Moylan


People also ask

What does undefined behavior mean in C?

When we run a code, sometimes we see absurd results instead of expected output. So, in C/C++ programming, undefined behavior means when the program fails to compile, or it may execute incorrectly, either crashes or generates incorrect results, or when it may fortuitously do exactly what the programmer intended.

What type of behavior C is undefined?

According to the C standards, signed integer overflow is undefined behaviour too. A few compilers may trap the overflow condition when compiled with some trap handling options, while a few compilers simply ignore the overflow conditions (assuming that the overflow will never happen) and generate the code accordingly.

What is undefined data type in C?

In computing (particularly, in programming), undefined value is a condition where an expression does not have a correct value, although it is syntactically correct. An undefined value must not be confused with empty string, Boolean "false" or other "empty" (but defined) values.

Does C have index out of bounds?

C does not check array bounds. In fact, a segmentation fault isn't specifically a runtime error generated by exceeding the array bounds. Rather, it is a result of memory protection that is provided by the operating system.


2 Answers

It's undefined behavior. We can juxtapose a couple of passages to be convinced of it. First, and I won't explicitly prove it, table[4] is *(table + 4). We need only ask ourselves the properties of the pointer value table + 4 and how it relates to the requirements of the indirection operator.

On the pointer, we have this passage:

[basic.compound]

3 Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object ([expr.add]), or
  • the null pointer value for that type, or
  • an invalid pointer value.

Our pointer is of the second bullet's type, not the first. As for the indirection operator:

[expr.unary.op]

1 The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T”, the type of the result is “T”.

I hope it's obvious from reading this paragraph that the operation is defined for a pointer of the category described by the first bullet in the preceding paragraph.

So we apply an operation to a pointer value for which its behavior is not defined. The result is undefined behavior.

like image 162
StoryTeller - Unslander Monica Avatar answered Oct 18 '22 01:10

StoryTeller - Unslander Monica


Subscript operator is defined through addition operator. The array decays to a pointer to first element in this identical expression, so rules of pointer arithmetic apply. Indirection operator is used on the hypothetical result of the addition.

[expr.sub]

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type. The expression E1[E2] is identical (by definition) to *((E1)+(E2)), ...


In case where the array index is more than one past the last element i.e. E2 > std::size(E1) (which isn't the case in the example program), the hypothetical pointer arithmetic itself is undefined.

[expr.add]

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value ... (does not apply)
  • Otherwise, if P points to an array element i of an array object x with n elements ([dcl.array]), the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) array element i+j of x if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) array element i−j of x if 0≤i−j≤n. (does not apply when i-j > n)
  • Otherwise, the behavior is undefined.

In case of E2 == std::size(E1) (which is the case in last iteration of the example), the hypothetical result of the addition is a pointer to one past the array and points to outside the storage of the array. The hypothetical pointer arithmetic is well defined.

[basic.compound]

A value of a pointer type that is a pointer ... past the end of an object represents ... the first byte in memory after the end of the storage occupied by the object

Access is defined in terms of objects. But there is no object there, nor is there even storage, and thus there isn't definition for the behaviour.

OK, there might in some cases be an unrelated object in the pointed memory address. Following note says that pointer past the end is not a pointer to such unrelated object sharing the address. I couldn't find which normative rule causes this.

[Note 2: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type, even if the unrelated object is located at that address. ...

Alternatively, we can look at the definition of indirection operator:

[expr.unary.op]

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type ... and the result is an lvalue referring to the object ... to which the expression points. ...

There is a contradiction because there is no object that could be referred to.


So, in conclusion:

int table[N] = {};
table[N] == 0; // UB, accessing non-existing object
table[N + 1];  // UB, [expr.add]
table + N;     // OK, one past last element
table[N];      // ¯\_(ツ)_/¯ See CWG 232
like image 41
eerorika Avatar answered Oct 18 '22 02:10

eerorika