Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion regarding types, overflows and UB in pointer-integral addition

I used to think that adding an integral type to a pointer (provided that the the pointer points to an array of a certain size etc. etc.) is always well defined, regardless of the integral type. The C++11 standard says ([expr.add]):

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i -th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n ) point to, respectively, the i + n -th and i − n -th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

On the other hand, it was brought to my attention recently that the built-in add operators for pointers are defined in terms of ptrdiff_t, which is a signed type (see 13.6/13). This seems to hint that if one does a malloc() with a very large (unsigned) size and then tries to reach the end of the allocated space via a pointer addition with a std::size_t value, this might result in undefined behaviour because the unsigned std::size_t will be converted to ptrdiff_t which is potentially UB.

I imagine similar issues would arise, e.g., in the operator[]() of std::vector, which is implemented in terms of an unsigned size_type. In general, it seems to me like this would make practically impossible to fully use the memory storage available on a platform.

It's worth noting that nor GCC nor Clang complain about signed-unsigned integral conversions with all the relevant diagnostic turned on when adding unsigned values to pointers.

Am I missing something?

EDIT: I'd like to clarify that I am talking about additions involving a pointer and an integral type (not two pointers).

EDIT2: an equivalent way of formulating the question might be this. Does this code result in UB in the second line, if ptrdiff_t has a smaller positive range than size_t?

char *ptr = static_cast<char * >(std::malloc(std::numeric_limits<std::size_t>::max()));
auto end = ptr + std::numeric_limits<std::size_t>::max();
like image 994
bluescarni Avatar asked Jun 12 '15 13:06

bluescarni


1 Answers

Your question is based on a false premise.

Subtraction of pointers produces a ptrdiff_t §[expr.add]/6:

When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2).

That does not, however, mean that addition is defined in terms of ptrdiff_t. Rather the contrary, for addition only one conversion is specified (§[expr.add]/1):

The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.

The "usual arithmetic conversions" are defined in §[expr]/10. This includes only one conversion from unsigned type to signed type:

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, the operand with unsigned integer type shall be converted to the type of the operand with signed integer type.

So, while there may be some room for question about exactly what type the size_t will be converted to (and whether it's converted at all), there's no question on one point: the only way it can be converted to a ptrdiff_t is if all its values can be represented without change as a ptrdiff_t.

So, given:

size_t N;
T *p;

...the expression p + N will never fail because of some (imagined) conversion of N to a ptrdiff_t before the addition takes place.

Since §13.6 is being mentioned, perhaps it's best to back up and look carefully at what §13.6 really is:

The candidate operator functions that represent the built-in operators defined in Clause 5 are specified in this subclause. These candidate functions participate in the operator overload resolution process as described in 13.3.1.2 and are used for no other purpose.

[emphasis added]

In other words, the fact that §13.6 defines an operator that adds a ptrdiff_t to a pointer does not mean that when any other integer type is added to a pointer, it's first converted to a ptrdiff_t, or anything like that. More generally, the operators defined in §13.6 are never used to carry out any arithmetic operations.

With that, and the rest of the text you quoted from §[expr.add], we can quickly conclude that adding a size_t to a pointer can overflow if and only if there aren't that many elements in the array after the pointer.

Given the above, one more question probably occurs to you. If I have code like this:

char *p = huge_array;
size_t N = sizeof(huge_array);
char *p2 = p + N;

ptrdiff_t diff = p2 - p;

...is it possible that the final subtraction will overflow? The short and simple answer to that is: Yes, it can.

like image 75
Jerry Coffin Avatar answered Nov 15 '22 07:11

Jerry Coffin