Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On what architectures is calculating invalid pointers unsafe?

int* a = new int[5] - 1;

This line by itself invokes undefined behavior according to the C++ standard because a is an invalid pointer and not one-past-the-end. At the same time this is a zero overhead way of making a 1-based array (first element is a[1]) which I need for a project of mine.

I'm wondering if this is something that I need to avoid or if the C++ standard is just being conservative to support some bizarre architectures that my code is never going to run on anyway. So the question is, on what architectures will this be a problem? Are any of those widespread?

Edit: To see that the line above does indeed invoke undefined behavior, take a look at this question.

Edit: Dennis Zickefoose points out that compilers are allowed to do anything when undefined behavior is invoked, so both the compiler and the CPU have to offer guarantees beyond the C++ standard for code like this to work. I'm expanding the question to whether any modern C++ compilers have this issue.

like image 526
Bjarke H. Roune Avatar asked Jul 03 '11 01:07

Bjarke H. Roune


4 Answers

Either way, a well-defined, zero-overhead way of creating a one-based array is the following:

int* a = new int[6];

There, problem solved. ;-) (But interesting question still.)

like image 100
Konrad Rudolph Avatar answered Nov 04 '22 06:11

Konrad Rudolph


The hardware for doing the checks is present in all x86 processors, we are just not using it at the moment in the most popular operating systems.

If you use a segmented memory architecture, which we did for 16-bit systems, an allocation is not unlikely to return the address segment:0. In that case you just cannot subtract anything from that address!

Here is a starting point for reading about segmented memory and why loading an invalid segment is not possible:

http://en.wikipedia.org/wiki/Segment_descriptor

You have to decide if this unlikely to happen for your code, or if you perhaps can define an overloaded operator[] that handles the offset for you.

like image 45
Bo Persson Avatar answered Nov 04 '22 07:11

Bo Persson


Note: I am not going to answer your question. But judging from the comments, several experienced SO members do not seem to know that this behavior actually is undefined... So somewhere in here we ought to quote chapter and verse in some standard.

So...

From the C++0x standard (draft), section 5.7 (5), discussing the behavior of pointer addition (emphasis mine):

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Similar language appears in every version of the C and C++ standards. Pointer arithmetic producing a result outside of the bounds of the array (plus one) is undefined behavior, even if you never dereference the pointer, and it always has been.

...

Now, here is my own response to your question: You are asking the wrong question. Even if this never creates a problem on any current machine and compiler, you should be worried about future machines and compilers, because 90% of the cost of all software is maintenance. By coding rigidly to the spec, you guarantee that your code will work on any system that complies with the standard, whether today or 50 years from now. The risk of introducing subtle bugs for someone trying to maintain your code in the future is almost never outweighed by any benefit today.

Also, I am skeptical of the real-world performance difference. (I am not saying you are wrong; I am just saying I am skeptical.) While I admire your attempt to create "the world's fastest binary heap", I suspect you are underestimating the effects of the memory hierarchy. Sure, I believe that "multiply by two" is twice as fast as "multiply by two and add one". But I also believe that fetching a cache line from main memory is hundreds of times slower than either.

If you benchmark your heap by performing a small set of operations on it over and over, without doing anything else, you are probably operating entirely out of the L1 cache. But that is completely unrealistic. In real life, a heap is just a small piece of a large algorithm; the odds that the whole thing will always be sitting there in the L1 cache is very low unless that algorithm is totally trivial. So the cost of memory access is likely to dominate in any real-world scenario.

like image 38
Nemo Avatar answered Nov 04 '22 07:11

Nemo


I think this could possibly be unsafe on some old 16 bit x86 systems. The address is "split" between an address register and a segment register and I would guess that this could result in an invalid value being loaded into a segment register which would cause an exception.

Probably not an issue as it's not a common architecture these days.

like image 27
jcoder Avatar answered Nov 04 '22 06:11

jcoder