Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rationale for pointer comparisons outside an array to be UB

So, the standard (referring to N1570) says the following about comparing pointers:

C99 6.5.8/5 Relational operators

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. ... [snip obvious definitions of comparison within aggregates] ... In all other cases, the behavior is undefined.

What is the rationale for this instance of UB, as opposed to specifying (for instance) conversion to intptr_t and comparison of that?

Is there some machine architecture where a sensible total ordering on pointers is hard to construct? Is there some class of optimization or analysis that unrestricted pointer comparisons would impede?

A deleted answer to this question mentions that this piece of UB allows for skipping comparison of segment registers and only comparing offsets. Is that particularly valuable to preserve?

(That same deleted answer, as well as one here, note that in C++, std::less and the like are required to implement a total order on pointers, whether the normal comparison operator does or not.)

like image 415
Phil Miller Avatar asked Jul 01 '15 01:07

Phil Miller


People also ask

What happens when you compare two pointers?

When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal.

Can pointers be compared?

We can compare pointers if they are pointing to the same array. Relational pointers can be used to compare two pointers. Pointers can't be multiplied or divided.

Can we use relational operators on pointers?

Relational Operators Relational operations are often used to compare the values of the variable based on which we can take decisions. The given table shows the relational operators that can be performed on pointer variables. The value of the relational expression is either 0 or 1 that is false or true.


2 Answers

Various comments in the ub mailing list discussion Justification for < not being a total order on pointers? strongly allude to segmented architectures being the reason. Including the follow comments, 1:

Separately, I believe that the Core Language should simply recognize the fact that all machines these days have a flat memory model.

and 2:

Then we maybe need an new type that guarantees a total order when converted from a pointer (e.g. in segmented architectures, conversion would require taking the address of the segment register and adding the offset stored in the pointer).

and 3:

Pointers, while historically not totally ordered, are practically so for all systems in existence today, with the exception of the ivory tower minds of the committee, so the point is moot.

and 4:

But, even if segmented architectures, unlikely though it is, do come back, the ordering problem still has to be addressed, as std::less is required to totally order pointers. I just want operator< to be an alternate spelling for that property.

Why should everyone else pretend to suffer (and I do mean pretend, because outside of a small contingent of the committee, people already assume that pointers are totally ordered with respect to operator<) to meet the theoretical needs of some currently non-existent architecture?

Counter to the trend of comments from the ub mailing list, FUZxxl points out that supporting DOS is a reason not to support totally ordered pointers.

Update

This is also supported by the Annotated C++ Reference Manual(ARM) which says this was due to burden of supporting this on segmented architectures:

The expression may not evaluate to false on segmented architectures [...] This explains why addition, subtraction and comparison of pointers are defined only for pointers into an array and one element beyond the end. [...] Users of machines with a nonsegmented address space developed idioms, however, that referred to the elements beyond the end of the array [...] was not portable to segmented architectures unless special effort was taken [...] Allowing [...] would be costly and serve few useful purposes.

like image 108
Shafik Yaghmour Avatar answered Oct 09 '22 21:10

Shafik Yaghmour


The 8086 is a processor with 16 bit registers and a 20 bit address space. To cope with the lack of bits in its registers, a set of segment registers exists. On memory access, the dereferenced address is computed like this:

address = 16 * segment + register

Notice that among other things, an address has generally multiple ways to be represented. Comparing two arbitrary addresses is tedious as the compiler has to first normalize both addresses and then compare the normalized addresses.

Many compilers specify (in the memory models where this is possible) that when doing pointer arithmetic, the segment part is to be left untouched. This has several consequences:

  • objects can have a size of at most 64 kB
  • all addresses in an object have the same segment part
  • comparing addresses in an object can be done just by comparing the register part; that can be done in a single instruction

This fast comparison of course only works when the pointers are derived from the same base-address, which is one of the reasons why the C standard defines pointer comparisons only for when both pointers point into the same object.

If you want a well-ordered comparison for all pointers, consider converting the pointers to uintptr_t values first.

like image 32
fuz Avatar answered Oct 09 '22 21:10

fuz