Note: if after reading this question you think, "how can that even happen", that is ok. If you want to keep an open mind, there are some points after the question that you can follow and that show how this can happen and why this is useful. Just remember that this is just a question and not a tutorial on any of these topics. The comments have enough noise already and they are hard to follow. If you have questions about these topics, I would appreciate if you post them as questions in SO instead of in the comments.
Question: If I have an object of type int
stored at the address pointed by c
int* c = /* allocate int (returns unique address) */;
*c = 3;
referred by two pointers a
and b
:
int* a = /* create pointer to (*c) */;
int* b = /* create pointer to (*c) */;
such that:
assert(a != b); // the pointers point to a different address
assert(*b == 3);
*a = 2;
assert(*b == 2); // but they refer to the same value
Is this undefined behavior? If yes, which part of the C++ standard disallows this? If not, which parts of the C++ standard allows this?
Note: the memory c
points to is allocated with a memory allocation function that returns an unique address (new
, malloc
, ...). The way to create these pointers with different values is very platform specific, although in most unix systems it can be done with mmap
and on windows it can be done with VirtualAlloc
.
Background: most operating systems (those that have a userspace that is not on ring 0) run their processes on virtual memory, and have a map from virtual memory pages to physical memory pages. Some of these systems (Linux/MacOS/BSDs/Unixes and 64bit windows) provide some system calls (like mmap
or VirtualAlloc
) that can be used to map two virtual memory pages to the same physical memory page. When a process performs this, it can essentially access the same page of physical memory from two different virtual memory addresses. That is, those two pointers will have a different value, but they will access the same physical memory storage. Keywords to google for: mmap
, virtual memory, memory pages. Data-structures that use this feature for profit are "magic ring buffer"s (that's the technical term), and non-reallocating dynamically-sized vectors (that is, vectors that do not need to reallocate memory when they grow). Google provides more information about these than I could ever fit here.
Very minimal probably non-working example (unix only):
We first allocate an int on the heap. The following request an anonymous, non-file-backed, mapping of virtual memory. One must request here at least a whole memory page, but for simplicity I'll just request the size of an int
(mmap
will allocate a whole memory page anyways):
int* c= mmap(NULL, sizeof(int), PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE,-1, 0);
Now we need to map this to two independent memory locations, so we map it to the same memory-mapped file, twice, to, e.g., two adjacent memory locations. We won't really use this file, but we still need to create it and open it:
mmap(c, sizeof(int), PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, some_fd, 0);
mmap(c + 1, sizeof(int), PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, some_fd, 0);
Now we are almost done:
int* a = c;
int* b = c + 1;
These are obviously different virtual addresses:
assert(a != b);
But they point to the same, non-file-backed, physical memory page:
*a = 314;
assert(*b == 314);
So there you go. Using VirtualAlloc
the same can be done on Windows, but the API is a bit different.
In computer programming, undefined behaviour is defined as 'the result of compiling computer code which is not prescribed by the specs of the programming language in which it is written'.
It is not legal to add two pointers... The binary + operation, and its meaning, is defined by the standard for a pointer and an integral type. The standard simply does not define the binary + operation between two pointers.
That's because pointers don't behave like integers. It's undefined behavior because the standard says so.
No, in fact you must not free the same object twice. If you have two pointers pointing to something, one way is to use "shared pointers" which do reference counting; another is to use raw pointers in your data structure and manage the lifetime of the objects elsewhere.
First lets look at what the standard has to say about an object
[intro.object]
The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is a region of storage. [ Note: A function is not an object, regardless of whether or not it occupies storage in the way that objects do. —end note ] An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed. The properties of an object are determined when the object is created. An object can have a name (Clause 3). An object has a storage duration (3.7) which influences its lifetime (3.8). An object has a type (3.9). The term object type refers to the type with which the object is created. Some objects are polymorphic (10.3); the implementation generates information associated with each such object that makes it possible to determine that object’s type during program execution. For other objects, the interpretation of the values found therein is determined by the type of the expressions (Clause 5) used to access them.
And then we have
Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses.
So we know that an object has an address and it is the first byte of the storage it uses. If we look at what a byte is we have
[intro.memory]
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementationdefined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
Emphasis mine
So if we have a pointer to an object the pointer is going to hold a unique value(address). If we have another pointer to that same object then it will also have to have that same value(address). Undefined behavior does not even enter the equation as you simply cannot have two pointers to the same object that have different values.
The C++ standard does not define mmap
, or any other method of mapping memory. The C++ standard only concerns about one way to view the memory. If the system uses virtual memory, then the standard is only concerned about virtual memory. No relation between virtual and physical memory is specified as far as I know.
What the standard says about memory:
The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
What the standard says about objects:
Unless an object is a bit-field or a base class subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses.
So, when you ask:
Is it undefined behavior to have two pointers with different values referring to the same object?
The two premises are contradictory. You can never have two pointers with different value referring to the same object. What you have is two different objects from the point of view of the standard. Even if the both virtual addresses are mapped to the same physical memory.
If we we're to assume that in the following code, the pointers a
and b
are magically mapped to same physical memory:
int *a, *b; // initialize with magic mapping of your choice
*a = 1;
if(a != b) {
*b = 2;
std::cout << *a; // what is the value of *a?
}
As far as the standard is concerned, *a
and *b
are different objects. They must be, because they have a different address. A compiler is free to optimize the reading of *a
away and use the constant 1, because at no point between *a = 1
and reading *a
, is anything other than *b
modified, which is an unrelated object.
So, if the compiler chooses to optimize, and use a constant, the output will be 1
. But, if the memory is actually read, and the virtual address is actually mapped to physical memory to which, 2
was just written, the output could be different. I don't know if it's explicitly undefined behaviour, but it's definitely unspecified at the very least.
Memory mapping is specified by the implementation, and so, the implementation specifies how memory mapped objects behave.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With