Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the rationale behind returning unique addresses for allocations of zero size in C++?

What is the rationale behind returning unique addresses for allocations of zero size in C++?

Background: the C11 standard says about malloc (7.20.3 Memory management functions):

If the size of the space requested is zero, the behavior is implementation defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object.

That is, as I see it, malloc always succeeds for allocations of zero size since the only you can do with the pointer of a zero-sized allocation is call some other memory allocation function like free with it:

  • if malloc returns NULL, free(NULL) is ok so this can be considered a success,
  • if it returns some other value, that's also a success (because it isn't NULL), the only condition is that free on the value should also work.

Also, C11 (also 7.20.3) does not specify that the address returned from malloc must be unique, only that they must point to disjoint memory regions:

The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated). The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

All objects of zero size are disjoint AFAICT, and that would mean that malloc can return the same pointer for multiple zero-sized allocations (e.g. NULL would be fine), or different pointers each time, or the same pointer for some, etc.

Then C++98 came along with two raw memory allocation functions:

void* operator new(std::size_t size);
void* operator new(std::size_t size, std::align_val_t alignment);

Note that these functions only return raw memory: they do not create or initialize any objects of any type AFAICT.

You call them like this:

#include <iostream>
#include <new>
int main() {
    void* ptr = operator new(std::size_t{0});
    std::cout << ptr << std::endl;
    operator delete(ptr, std::size_t{0});
    return 0;
}

The [new.delete.single] section of the C++17 standard explains them, but the key guarantee as I see it is given in [basic.stc.dynamic.allocation]:

Even if the size of the space requested is zero, the request can fail. If the request succeeds, the value returned shall be a non-null pointer value (7.11) p0 different from any previously returned value p1, unless that value p1 was subsequently passed to an operator delete. Furthermore, for the library allocation functions in 21.6.2.1 and 21.6.2.2, p0 shall represent the address of a block of storage disjoint from the storage for any other object accessible to the caller. The effect of indirecting through a pointer returned as a request for zero size is undefined.38

That is, they must always return distinct pointers on success. That's a bit change from malloc.

My question is: What is the rationale behind this change? (that is, behind returning unique addresses for allocations of zero size in C++)

Ideally the answer would be just a link to the paper (or some other source) that explored the alternatives and motivated their semantics. Typically I go for The Design and Evolution of C++ for these C++98 questions, but Section 10 (Memory Management) does not mention anything about it. Otherwise, some sort of authoritative reference would be nice.


Disclaimer: I asked it on reddit but I did not ask nicely enough so I did not got any useful answer. I would like to kindly ask you that if you only have an hypothesis, please feel free to post it as an answer but mention that it is only an hypothesis.

Also, on reddit people went on and on about zero-sized types, whether I have a proposal to change the standard, etc. This question is about the semantics of the raw memory allocation functions when passed a size equal to zero. If topics like zero-sized types are relevant for your answer, please include them! But try not to get too derailed with tangential issues.

Also, on reddit people also threw arguments like "that's for optimization purposes" without really being able to mention anything more concrete. I'd expect something more concrete than "because optimizations" in an answer. For example, one redditor mentioned aliasing optimizations, but I wondered which kind of aliasing optimizations apply to pointers that cannot be dereferenced, and wasn't able to get anyone to comment on that. So maybe if you are going to mention optimizations, a small example that shows it would enrich the discussion.

like image 774
gnzlbg Avatar asked May 30 '18 17:05

gnzlbg


2 Answers

The problem is that objects (no matter their size) in C++ must have a unique identity. So different coexisting objects (no matter their size) must have different address, since two pointer that compare as equal are assumed to point to a same object.

If you admit that zero-sized objects can have same address you cannot anymore distinguish if two address are or not a same object.


Many comments about the "new does not return objects" issue.

Please FORGET OOP terminology in this context:

C++ specification have a precise definition of what the word "Object" means.

CPP Reference:Object

In particular:

C++ programs create, destroy, refer to, access, and manipulate objects. An object, in C++, is a region of storage that has

  • size (can be determined with sizeof);
  • alignment requirement (can be determined with alignof);
  • storage duration (automatic, static, dynamic, thread-local);
  • lifetime (bounded by storage duration or temporary);
  • type;
  • value (which may be indeterminate, e.g. for default-initialized non-class types);
  • optionally, a name.

The following entities are not objects: value, reference, function, enumerator, type, non-static class member, bit-field, template, class or function template specialization, namespace, parameter pack, and this.

A variable is an object or a reference that is not a non-static data member, that is introduced by a declaration.

Objects are created by definitions, new-expressions, throw-expressions, when changing the active member of a union, and where temporary objects are required.

like image 127
Emilio Garavaglia Avatar answered Nov 15 '22 12:11

Emilio Garavaglia


The reason is simply that code should not require special handling of boundary conditions. Many, I would say most, algorithms have to deal with zero-sized objects as boundary conditions. Less common is the algorithm that compare pointers to objects to see if they are the same object, but this still should work even for zero-sized objects.

However, your question assumes that this is a change. Apart from a brief hiatus in the late 1980's all C and C++ implementations that I am aware of have always behaved like this.

The original C compiler by dmr behaved like this but then around 1987 the draft C standard specified that malloc of a zero sized object return NULL. This was truly bizarre and even the final C-89 standard made it implementation-defined but I have never since encountered an implementation that did this horrible thing.

I talk more about this in my blog in the section "Malloc Madness".

like image 38
Andrew W. Phillips Avatar answered Nov 15 '22 10:11

Andrew W. Phillips