Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does taking address of member variable through a null pointer yield undefined behavior?

The following code (or its equivalent which uses explicit casts of null literal to get rid of temporary variable) is often used to calculate the offset of a specific member variable within a class or struct:

class Class {
public:
    int first;
    int second;
};

Class* ptr = 0;
size_t offset = reinterpret_cast<char*>(&ptr->second) -
                 reinterpret_cast<char*>(ptr);

&ptr->second looks like it is equivalent to the following:

&(ptr->second)

which in turn is equivalent to

&((*ptr).second)

which dereferences an object instance pointer and yields undefined behavior for null pointers.

So is the original fine or does it yield UB?

like image 825
sharptooth Avatar asked Sep 08 '14 13:09

sharptooth


People also ask

Why is dereferencing a null pointer undefined behavior?

Because a null pointer does not point to a meaningful object, an attempt to dereference (i.e., access the data stored at that memory location) a null pointer usually (but not always) causes a run-time error or immediate program crash. In C, dereferencing a null pointer is undefined behavior.

Can address of pointer be null?

Null pointer is a pointer which points nothing. Some uses of null pointer are: b) To initialize a pointer variable when that pointer variable isn't assigned any valid memory address yet. b) To pass a null pointer to a function argument when we don't want to pass any valid memory address.

What happens when a pointer is assigned null?

C. Explanation: What happens here is that when a Null pointer is created, it points to null, without any doubt. But the variable of Null pointer takes some memory. Hence when a pointer to a null pointer is created, it points to an actual memory space, which in turn points to null.

Can we assign a null value to a pointer variable?

It is always a good practice to assign the pointer NULL to a pointer variable in case you do not have exact address to be assigned. This is done at the time of variable declaration. A pointer that is assigned NULL is called a null pointer.


1 Answers

Despite the fact that it does nothing, char* foo = 0; *foo; is could be undefined behavior.

Dereferencing a null pointer is could be undefined behavior. And yes , ptr->foo is equivalent to (*ptr).foo, and *ptr dereferences a null pointer.

There is currently an open issue in the working groups about if *(char*)0 is undefined behavior if you don't read or write to it. Parts of the standard imply it is, other parts imply it is not. The current notes there seem to lean towards making it defined.

Now, this is in theory. How about in practice?

Under most compilers, this works because no checks are done at dereferencing time: memory around where null pointer point to is guarded against access, and the above expression simply takes an address of something around null, it does not read or write the value there.

This is why cpp reference offsetof lists pretty much that trick as a possible implementation. The fact that some (many? most? every one I've checked?) compilers implement offsetof in a similar or equivalent manner does not mean that the behavior is well defined under the C++ standard.

However, given the ambiguity, compilers are free to add checks at every instruction that dereferences a pointer, and execute arbitrary code (fail fast error reporting, for example) if null is indeed dereferenced. Such instrumentation might even be useful to find bugs where they occur, instead of where the symptom occurs. And on systems where there is writable memory near 0 such instrumentation could be key (pre-OSX MacOS had some writable memory that controlled system functions near 0).

Such compilers could still write offsetof that way, and introduce pragmas or the like to block the instrumentation in the generated code. Or they could switch to an intrinsic.

Going a step further, C++ leaves lots of latitude on how non-standard-layout data is arranged. In theory, classes could be implemented as rather complex data structures and not the nearly standard-layout structures we have grown to expect, and the code would still be valid C++. Accessing member variables to non-standard-layout types and taking their address could be problematic: I do not know if there is any guarantee that the offset of a member variable in a non-standard layout type does not change between instances!

Finally, some compilers have aggressive optimization settings that find code that executes undefined behavior (at least under certain branches or conditions), and uses that to mark that branch as unreachable. If it is decided that null dereference is undefined behavior, this could be a problem. A classic example is gcc's aggressive signed integer overflow branch eliminator. If the standard dictates something is undefined behavior, the compiler is free to consider that branch unreachable. If the null dereference is not behind a branch in a function, the compiler is free to declare all code that calls that function to be unreachable, and recurse.

And it would be free to do this in not the current, but the next version of your compiler.

Writing code that is standards-valid is not just about writing code that compiles today cleanly. While the degree to which dereferencing and not using a null pointer is defined is currently ambiguous, relying on something that is only ambiguously defined is risky.

like image 160
Yakk - Adam Nevraumont Avatar answered Oct 12 '22 17:10

Yakk - Adam Nevraumont