Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Are Zero Length VLAs UB?

The 2011 standard explicitly states...

6.7.6.2 Array declarators

  1. If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.

It's contrived, but the following code seems reasonable.

size_t vla(const size_t x) {

  size_t a[x];
  size_t y = 0;

  for (size_t i = 0; i < x; i++)
    a[x] = i;

  for (size_t i = 0; i < x; i++)
    y += a[i % 2];

  return y;
}

Clang seems to generate reasonable x64 assembly for it (without optimizations). Obviously indexing a zero length VLA doesn't make sense, but accessing beyond bounds invokes undefined behavior.

Why are zero length arrays undefined?

like image 771
Jason Avatar asked Oct 26 '15 17:10

Jason


3 Answers

int i = 0;
int a[i], b[i];

Is a == b? It shouldn't be - they're different objects - but avoiding it is problematic. If you leave a gap between a and b unconditionally, you're wasting space in the i > 0 case. If you check whether i == 0 and only leave a gap then, you're wasting time in the i > 0 case.

It gets worse with multidimensional arrays:

int i = 0;
int a[2][i];

You can pad between two variables, but where could you pad here? There's no way to do it without breaking the invariant that sizeof (int[2][i]) == 2 * i * sizeof (int). If you don't pad, then a[0] and a[1] have the same address, and you're breaking a different important invariant.

It's a headache that isn't worth defining.

like image 83
user2357112 supports Monica Avatar answered Oct 16 '22 15:10

user2357112 supports Monica


Although we can see that gcc supports zero length arrays an extension, so clearly they are useful. From a standard perspective it would seem to create some issues since as it stands now each object should have a unique address. We can see this from the draft C99 and C11 standard section 6.5.9 Equality operators which says:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.94)

So this would require a bit of special casing and most of the usefulness such as flexibile arrays can be provided using alternative methods.

It would also likely require changes in other places as well, as M.M. points out array to pointer decay in section 6.3.2.1 Lvalues, arrays, and function designators:

[...]an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue[...]

This seems like it would require several non-trivial changes for minimal added benefit.

like image 4
Shafik Yaghmour Avatar answered Oct 16 '22 15:10

Shafik Yaghmour


Looking at C standard:

C11- 6.7.6.2 Array declarators (p1):

[...] If the expression is a constant expression, it shall have a value greater than zero. [...]

(p5):

If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. [...]

4. Conformance:

If a "shall" or "shall not" requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".

Therefore, declaring a zero size array leads to undefined behavior of the program.

like image 1
haccks Avatar answered Oct 16 '22 16:10

haccks