Supposing we have: <pre class="prettyprint"><code>char* a; int i; </code></pre> Many introductions to C++ (like this one) suggest that the rvalues <code>a+i</code> and <code>&a[i]</code> are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]: <blockquote> in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior. </blockquote> In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating <code>&a[i]</code> (within the <code>offsetof</code> macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that <code>&a[i]</code> causes undefined behavior in the case where <code>a=null</code> and <code>i=0</code>. This behavior is different from <code>a+i</code> (at least in C++, in the a=null, i=0 case). This leads to at least 2 questions about the differences between <code>a+i</code> and <code>&a[i]</code>: First, what is the underlying semantic difference between <code>a+i</code> and <code>&a[i]</code> that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that <code>&a[i]</code> might generate a memory access to <code>a[i]</code>? Or the spec author wasn't happy with null dereferences that day? Or something else? Second, besides the case where <code>a=null</code> and <code>i=0</code>, are there any other cases where <code>a+i</code> and <code>&a[i]</code> behave differently? (could be covered by the first question, depending on the answer to it.)

In the C++ standard, section [expr.sub]/1 you can read: <blockquote> The expression <code>E1[E2]</code> is identical (by definition) to <code>*((E1)+(E2))</code>. </blockquote> This means that <code>&a[i]</code> is exactly the same as <code>&*(a+i)</code>. So you would dereference <code>*</code> a pointer first and get the address <code>&</code> second. In case the pointer is invalid (i.e. <code>nullptr</code>, but also out of range), this is UB. <code>a+i</code> is based on pointer arithmetics. At first it looks less dangerous since there is no dereferencing that would be UB for sure. However, it may also be UB (see [expr.add]/4: <blockquote> When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined. </blockquote> So, while the semantics behind these two expression are slightly different, I would say that the result is the same in the end.

What are the differences between a+i and &a[i] for pointer arithmetic in C++?

Tags:

c++

language-lawyer

pointer-arithmetic

Supposing we have:

char* a;
int   i;

Many introductions to C++ (like this one) suggest that the rvalues a+i and &a[i] are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:

in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.

In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating &a[i] (within the offsetof macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i] causes undefined behavior in the case where a=null and i=0. This behavior is different from a+i (at least in C++, in the a=null, i=0 case).

This leads to at least 2 questions about the differences between a+i and &a[i]:

First, what is the underlying semantic difference between a+i and &a[i] that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that &a[i] might generate a memory access to a[i]? Or the spec author wasn't happy with null dereferences that day? Or something else?

Second, besides the case where a=null and i=0, are there any other cases where a+i and &a[i] behave differently? (could be covered by the first question, depending on the answer to it.)

257

asked Mar 01 '19 05:03

personal_cloud

2 Answers

TL;DR: a+i and &a[i] are both well-formed and produce a null pointer when a is a null pointer and i is 0, according to (the intent of) the standard, and all compilers agree.

a+i is obviously well-formed per [expr.add]/4 of the latest draft standard:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

[...]

&a[i] is tricky. Per [expr.sub]/1, a[i] is equivalent to *(a+i), thus &a[i] is equivalent to &*(a+i). Now the standard is not quite clear about whether &*(a+i) is well-formed when a+i is a null pointer. But as @n.m. points out in comment, the intent as recorded in cwg 232 is to permit this case.

Since core language UB is required to be caught in a constant expression ([expr.const]/(4.6)), we can test whether compilers think these two expressions are UB.

Here's the demo, if the compilers think the constant expression in static_assert is UB, or if they think the result is not true, then they must produce a diagnostic (error or warning) per standard:

^{(note that this uses single-parameter static_assert and constexpr lambda which are C++17 features, and default lambda argument which is also pretty new)}

static_assert(nullptr == [](char* a=nullptr, int i=0) {
    return a+i;
}());

static_assert(nullptr == [](char* a=nullptr, int i=0) {
    return &a[i];
}());

From https://godbolt.org/z/hhsV4I, it seems all compilers behave uniformly in this case, producing no diagnostics at all (which surprises me a bit).

However, this is different from the offset case. The implementation posted in that question explicitly creates a reference (which is necessary to sidestep user-defined operator&), and thus is subject to the requirements on references.

122

answered Oct 26 '22 23:10

cpplearner

In the C++ standard, section [expr.sub]/1 you can read:

The expression E1[E2] is identical (by definition) to *((E1)+(E2)).

This means that &a[i] is exactly the same as &*(a+i). So you would dereference * a pointer first and get the address & second. In case the pointer is invalid (i.e. nullptr, but also out of range), this is UB.

a+i is based on pointer arithmetics. At first it looks less dangerous since there is no dereferencing that would be UB for sure. However, it may also be UB (see [expr.add]/4:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.

So, while the semantics behind these two expression are slightly different, I would say that the result is the same in the end.

answered Oct 27 '22 00:10

Christophe

Related questions
                            
                                How to get the Graphics Card Model Name in OpenGL or Win32?
                            
                                Non-const pointer prefers const T& overload to const T*
                            
                                How to see compiler reformulation of C++ code with optimizations
                            
                                Perfect forwarding of variables declared with structured binding
                            
                                Why can't I allocate an array of a class with deleted or private destructor? [duplicate]
                            
                                Is there a downside to a significant overestimation in a reserve()?
                            
                                C++ - Where are thread_local variables stored?
                            
                                How to implement a real enum class in C++
                            
                                Is it guaranteed that sizeof(T[N]) == N * sizeof(T)?
                            
                                How to implement Nested Class Constructor in Source file
                            
                                GCC : Unscoped enumeration type give an ambiguity error
                            
                                Is it safe to capture a member reference if the class storing the original reference goes out of scope?
                            
                                How does C++ ABI deal with RVO and NRVO?
                            
                                What is the differences between begin(),end() and cbegin() ,cend()? [duplicate]
                            
                                Is noreturn part of the signature of a function?
                            
                                Range TS idioms and the mysterious auto &&
                            
                                std::call_once throws std::system_error (Unknown error -1)
                            
                                C++ project compiled with modern compiler, but linked against outdated libstdc++
                            
                                Once a lock on std::weak_ptr has returned nullptr, can it ever be non-null again?
                            
                                Iterating over odd (even) elements only in a range-based loop

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With