Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When exactly is a pointer difference defined?

Tags:

arrays

c

pointers

I have a question about differences of pointers and the resulting type, ptrdiff_t.

C99 §6.5.6 (9) says:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the header. If the result is not representable in an object of that type, the behavior is undefined. In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t.

§7.18.3 (2) requires ptrdiff_t to have a range of at least [−65535, +65535]

What I am interested in is the undefined behaviour if the result is too big. I couldn't find anything in the standard guarantying at least the same range as the signed version of size_t or something similar. So, now here my question: Could a conforming implementation make ptrdiff_t a signed 16-bit type but size_t 64 bit? [edit: as Guntram Blohm pointed out, 16 bit signed makes a maximum of 32767, so my example is obviously not conforming] As far as I see, I cannot do any pointer subtraction on arrays with more than 65535 elements in strictly conforming code even if the implementation supports objects much larger than this. Furthermore, the program may even crash.

The Rationale (V5.10) § 6.5.6 says:

It is important that this type [ptrdiff_t] be signed in order to obtain proper algebraic ordering when dealing with pointers within the same array. However, the magnitude of a pointer difference can be as large as the size of the largest object that can be declared; and since that is an unsigned type, the difference between two pointers can cause an overflow on some implementations.

which may explain why it is not required that every difference of pointers (to elements of the same array) is defined, but it does not explain why there isn't a restriction on PTRDIFF_MAX to be at least SIZE_MAX/2 (with integer division).

To illustrate, suppose T is any object type and n an object of size_t not known at compile time. I want to allocate space for n objects of T and I want to do pointer subtraction with addresses in the allocated range.

size_t half = sizeof(T)>1 ? 1 : 2; // (*)
if( SIZE_MAX/half/sizeof(T)<n ) /* do some error handling */;
size_t size = n * sizeof(T);
T *foo = malloc(size);
if(!foo) /* ... */;

would not be strictly conforming, I had to do

if( SIZE_MAX/sizeof(T) < n || PTRDIFF_MAX < n )

instead. Is it really that way? And if so, does someone know a reason for that (i.e. for not requiring PTRDIFF_MAX >= SIZE_MAX/2 [edit: changed > to >=] or something similar)?

(*) The half stuff in the first version is something I recognized while I was writing this text, I had

if( SIZE_MAX/2/sizeof(T) < n )

first, taking the half of SIZE_MAX to solve the problems mentioned in the Rationale; but then I realized we only need to half SIZE_MAX if sizeof(T) is 1. Given this code, the second version (the one which is surely strictly conforming) doesn't seem to be so bad at all. But still, I am interested if I'm right.

C11 keeps the wording of §6.5.6 (9), C++-related answers to this topic are also welcome.

like image 857
mafso Avatar asked Dec 10 '13 12:12

mafso


2 Answers

To give you an answer to the question in the title: pointer difference itself cannot be used to determine the difference of two pointers without eventually leading to undefined behavior. This would be a severe restriction, as you notice, on systems where PTRDIFF_MAX is much smaller than the possible size of an object. But such systems are rare (I don't know of any), so if you'd have code that depends on being able to do the difference with large objects you always put something like

#if PTRDIFF_MAX < SIZE_MAX/2
# error "we need an architecture with sufficiently wide ptrdiff_t"
#endif

But even in such a case (too narrow ptrdiff_t) you will always be able to compute the difference between two pointers of the same larger object.

  1. Determine which of the two (p or q) is smaller. This is always well defined.
  2. Say p is the smaller one, then test all p + i for size_t i starting at 1 until you reach q or i is SIZE_MAX.
  3. If the final i is SIZE_MAX and you didn't reach q the difference is not representable. Otherwise that i plus an eventual sign is the information that you are looking for.

This is not really satisfactory, but I was not able to figure out how to improve that linear algorithm to something logarithmic: to avoid UB we wouldn't be allowed to go beyond q with the comparison.

And, as I said you'd only need that for the case of some really exotic architecture.

Edit:

With mafso's trick for getting the most significant bit of the pointer difference this can be done in O(log(n)) where n is the required distance. First declare two internal functions that assume that p < q

// computes the maximum value bit of the pointer difference
//
// assumes that p < q
inline
uintmax_t ptrdiff_maxbit(char const* p, char const* q) {
  uintmax_t len = 1;
  while (p+len <= q-len)
    len <<= 1;
  return len;
}

// compute the pointer difference
//
// assumes that p < q
// assumes that len2 is a power of two
// assumes that the difference is strictly less than 2*len2
inline
uintmax_t ptrdiff_bounded(char const* p, char const* q, uintmax_t len2) {
  if (len2 < 2) return len2;
  uintmax_t len = (len2 >> 1);
  p += len;
  q -= len;
  for (; len; len >>= 1)
    if (p + len <= q) {
      len2 |= len;
      p += len;
    }
  return len2;
}

Then define the function that computes the difference in bytes and adds a convention in case the difference is not representable in intmax_t:

inline
intmax_t ptrdiff_byte(void const* p0, void const* q0) {
  char const * p = p0;
  char const * q = q0;
  if (p == q) return 0;
  if (p < q) {
    uintmax_t ret = ptrdiff_bounded(p, q, ptrdiff_maxbit(p, q));
    if (ret > (-(INTMAX_MIN+1))+UINTMAX_C(1)) return INTMAX_MIN;
    else return -ret;
  } else {
    uintmax_t ret = ptrdiff_bounded(q, p, ptrdiff_maxbit(q, p));
    if (ret > INTMAX_MAX) return INTMAX_MAX;
    else return ret;
  }
}

Finally, a macro that fits it with the type of *p.

#define ptrdiff(P, Q) (ptrdiff_byte((P), (Q))/(intmax_t)sizeof(*Q))
like image 88
Jens Gustedt Avatar answered Sep 26 '22 20:09

Jens Gustedt


I remember, back in the old days, some 16-bit 80x86 compilers had "large" or "huge" datamodels, where pointers had 32 bit, but integers still had only 16 bit. These compilers allowed you to create arrays that were larger than 65536 bytes, but, with integers being only 16 bit, accessing elements that weren't in the first 64K required pointer manipulation (which was real strange, a pointer consisted of a 16 bit segment value and a 16 bit offset value, with the real address being ((segment << 4) + offset))

I don't know how compliant these compilers were, but they would have had to define SIZE_MAX to something like 1M (since that's the largest object you can address given the strange pointer model), but ptrdiff_t would have been a 16 bit integer (which wouldn't be compliant as the range is only -32768 to +32767).

So, a sane C implementation on sane hardware wouldn't have any reason to have PTRDIFF_MAX less than SIZE_MAX. But there might be exotic hardware (which, in the case of the 80x86, wasn't really exotic at that time), that allows you to define large arrays, but does't allow you to access all of them "at once". In that case, PTRDIFF_MAX could well be below SIZE_MAX/2.

like image 41
Guntram Blohm Avatar answered Sep 24 '22 20:09

Guntram Blohm