Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the differences between a+i and &a[i] for pointer arithmetic in C++?

Supposing we have:

char* a;
int   i;

Many introductions to C++ (like this one) suggest that the rvalues a+i and &a[i] are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:

in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.

In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating &a[i] (within the offsetof macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i] causes undefined behavior in the case where a=null and i=0. This behavior is different from a+i (at least in C++, in the a=null, i=0 case).

This leads to at least 2 questions about the differences between a+i and &a[i]:

First, what is the underlying semantic difference between a+i and &a[i] that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that &a[i] might generate a memory access to a[i]? Or the spec author wasn't happy with null dereferences that day? Or something else?

Second, besides the case where a=null and i=0, are there any other cases where a+i and &a[i] behave differently? (could be covered by the first question, depending on the answer to it.)

like image 257
personal_cloud Avatar asked Mar 01 '19 05:03

personal_cloud


People also ask

What are the differences between AI and human intelligence?

Artificial Intelligence tries to create computers that can replicate human behavior and do human-like tasks, whereas Human Intelligence aims to adapt to new surroundings by combining various cognitive processes. Machines are digital, whereas the human brain is analogue.

What is the difference between AI and machine learning with example?

AI is a subset of Data Science. ML is a subset of AI and Data Science. Good examples of AI are Apple Siri, Google Assistant, Tesla self-driving cars, Amazon Alexa, etc. Good examples of machine learning are Google search engines, Twitter sentiment analysis, stock prediction, news classification, etc.

What is the difference between AI and technology?

Artificial intelligence is a technology that enables a machine to simulate human behavior. Machine learning is a subset of AI which allows a machine to automatically learn from past data without programming explicitly. The goal of AI is to make a smart computer system like humans to solve complex problems.

What is the difference between AI and robots?

Robotics and artificial intelligence are two related but entirely different fields. Robotics involves the creation of robots to perform tasks without further intervention, while AI is how systems emulate the human mind to make decisions and 'learn.


2 Answers

TL;DR: a+i and &a[i] are both well-formed and produce a null pointer when a is a null pointer and i is 0, according to (the intent of) the standard, and all compilers agree.


a+i is obviously well-formed per [expr.add]/4 of the latest draft standard:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • [...]

&a[i] is tricky. Per [expr.sub]/1, a[i] is equivalent to *(a+i), thus &a[i] is equivalent to &*(a+i). Now the standard is not quite clear about whether &*(a+i) is well-formed when a+i is a null pointer. But as @n.m. points out in comment, the intent as recorded in cwg 232 is to permit this case.


Since core language UB is required to be caught in a constant expression ([expr.const]/(4.6)), we can test whether compilers think these two expressions are UB.

Here's the demo, if the compilers think the constant expression in static_assert is UB, or if they think the result is not true, then they must produce a diagnostic (error or warning) per standard:

(note that this uses single-parameter static_assert and constexpr lambda which are C++17 features, and default lambda argument which is also pretty new)

static_assert(nullptr == [](char* a=nullptr, int i=0) {
    return a+i;
}());

static_assert(nullptr == [](char* a=nullptr, int i=0) {
    return &a[i];
}());

From https://godbolt.org/z/hhsV4I, it seems all compilers behave uniformly in this case, producing no diagnostics at all (which surprises me a bit).


However, this is different from the offset case. The implementation posted in that question explicitly creates a reference (which is necessary to sidestep user-defined operator&), and thus is subject to the requirements on references.

like image 122
cpplearner Avatar answered Oct 26 '22 23:10

cpplearner


In the C++ standard, section [expr.sub]/1 you can read:

The expression E1[E2] is identical (by definition) to *((E1)+(E2)).

This means that &a[i] is exactly the same as &*(a+i). So you would dereference * a pointer first and get the address & second. In case the pointer is invalid (i.e. nullptr, but also out of range), this is UB.

a+i is based on pointer arithmetics. At first it looks less dangerous since there is no dereferencing that would be UB for sure. However, it may also be UB (see [expr.add]/4:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.

So, while the semantics behind these two expression are slightly different, I would say that the result is the same in the end.

like image 41
Christophe Avatar answered Oct 27 '22 00:10

Christophe