I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.: <pre class="prettyprint"><code>void clearBottomRightElement(int *array, int M, int N) { array[M*N-1] = 0; // Pretend the array is one-dimensional } int mtx[5][3]; ... clearBottomRightElement(&mtx[0][0], 5, 3); </code></pre> However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular: <ol> <li>Does the standard guarantee that the compiler won't put padding in-between e.g. <code>mtx[0][2]</code> and <code>mtx[1][0]</code>?</li> <li> Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined: <pre class="prettyprint"><code>struct { int row[3]; // The object in question is an int[3] int other[10]; } foo; int *p = &foo.row[7]; // ERROR: A crude attempt to get &foo.other[4]; </code></pre> So by the same rule, one would expect the following to be undefined: <pre class="prettyprint"><code>int mtx[5][3]; int (*row)[3] = &mtx[0]; // The object in question is still an int[3] int *p = &(*row)[7]; // Why is this any better? </code></pre> So why should this be defined? <pre class="prettyprint"><code>int mtx[5][3]; int *p = &(&mtx[0][0])[7]; </code></pre> </li> </ol> So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.) EDIT Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.

All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from <code>sizeof</code> rules. Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible. This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like <pre class="prettyprint"><code>mtx[0] + 5 </code></pre> identically to well-defined counterparts like <pre class="prettyprint"><code>(int *)((char *)mtx + 5 * sizeof (int)) </code></pre> which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type <code>char</code>. <hr> On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like <pre class="prettyprint"><code>(mtx[0] + 3) + 2 </code></pre> reasoning that <code>mtx[0] + 3</code> is a pointer to one element past the end of <code>mtx[0]</code> (making the first addition well-defined) and as well as a pointer to the first element of <code>mtx[1]</code> (making the second addition well-defined) is incorrect: Even though <code>mtx[0] + 3</code> and <code>mtx[1] + 0</code> are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of <code>mtx[1]</code>.

One-dimensional access to a multidimensional array: is it well-defined behaviour?

Tags:

I imagine we all agree that it is considered idiomatic C to access a true multidimensional array by dereferencing a (possibly offset) pointer to its first element in a one-dimensional fashion, e.g.:

void clearBottomRightElement(int *array, int M, int N) {     array[M*N-1] = 0;  // Pretend the array is one-dimensional }   int mtx[5][3]; ... clearBottomRightElement(&mtx[0][0], 5, 3);

However, the language-lawyer in me needs convincing that this is actually well-defined C! In particular:

Does the standard guarantee that the compiler won't put padding in-between e.g. mtx[0][2] and mtx[1][0]?

Normally, indexing off the end of an array (other than one-past the end) is undefined (C99, 6.5.6/8). So the following is clearly undefined:

struct {     int row[3];           // The object in question is an int[3]     int other[10]; } foo; int *p = &foo.row[7];     // ERROR: A crude attempt to get &foo.other[4];

So by the same rule, one would expect the following to be undefined:

int mtx[5][3]; int (*row)[3] = &mtx[0];  // The object in question is still an int[3] int *p = &(*row)[7];      // Why is this any better?

So why should this be defined?

int mtx[5][3]; int *p = &(&mtx[0][0])[7];

So what part of the C standard explicitly permits this? (Let's assume c99 for the sake of discussion.)

EDIT

Note that I have no doubt that this works fine in all compilers. What I'm querying is whether this is explicitly permitted by the standard.

544

asked Jun 09 '11 09:06

Oliver Charlesworth

2 Answers

All arrays (including multidimensional ones) are padding-free. Even if it's never explicitly mentioned, it can be inferred from sizeof rules.

Now, array subscription is a special case of pointer arithmetics, and C99 section 6.5.6, §8 states clearly that behaviour is only defined if the pointer operand and the resulting pointer lie in the same array (or one element past), which makes bounds-checking implementations of the C language possible.

This means that your example is, in fact, undefined behaviour. However, as most C implementations do not check bounds, it will work as expected - most compilers treat undefined pointer expressions like

mtx[0] + 5

identically to well-defined counterparts like

(int *)((char *)mtx + 5 * sizeof (int))

which is well-defined because any object (including the whole two-dimensional array) can always be treated as a one-dimensinal array of type char.

On further meditation on the wording of section 6.5.6, splitting out-of-bounds access into seemingly well-defined subexpression like

(mtx[0] + 3) + 2

reasoning that mtx[0] + 3 is a pointer to one element past the end of mtx[0] (making the first addition well-defined) and as well as a pointer to the first element of mtx[1] (making the second addition well-defined) is incorrect:

Even though mtx[0] + 3 and mtx[1] + 0 are guaranteed to compare equal (see section 6.5.9, §6), they are semantically different. For example, the former can't be dereferenced and thus does not point to an element of mtx[1].

139

answered Oct 30 '22 05:10

Christoph

The only obstacle to the kind of access you want to do is that objects of type int [5][3] and int [15] are not allowed to alias one another. Thus if the compiler is aware that a pointer of type int * points into one of the int [3] arrays of the former, it could impose array bounds restrictions that would prevent accessing anything outside that int [3] array.

You might be able to get around this issue by putting everything inside a union that contains both the int [5][3] array and the int [15] array, but I'm really unclear on whether the union hacks people use for type-punning are actually well-defined. This case might be slightly less problematic since you would not be type-punning individual cells, only the array logic, but I'm still not sure.

One special case that should be noted: if your type were unsigned char (or any char type), accessing the multi-dimensional array as a one-dimensional array would be perfectly well-defined. This is because the one-dimensional array of unsigned char that overlaps it is explicitly defined by the standard as the "representation" of the object, and is inherently allowed to alias it.

answered Oct 30 '22 05:10

R.. GitHub STOP HELPING ICE

Related questions
                            
                                subprocess.wait() not waiting for Popen process to finish (when using threads)?
                            
                                Parallel.ForEach loop with BlockingCollection.GetConsumableEnumerable
                            
                                ClickOnce - File Already Exists Error - Why is a DLL File Trying to be Copied Twice by ClickOnce?
                            
                                CarrierWave: Create the same, unique filename for all versioned files
                            
                                iterating on enum type [duplicate]
                            
                                Bootloader in C won't compile
                            
                                How do I return from a function inside a lambda?
                            
                                Does WinRT have Garbage Collection?
                            
                                Blame on an earlier version of a file in a different location
                            
                                Simplify/ Clean up XML of a DOCX word document
                            
                                How to merge new files into another branch in TFS?
                            
                                How to get the scroll speed on a ListView?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With