Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a std::vector like container without undefined behavior

It may surprise some coders and, as surprising as it can be, it is not possible to implement std::vector without non-standard support of the compilers. The problem essentially resides on the ability to perform pointer arithmetic on a raw storage region. The paper, p0593: Implicit creation of objects for low-level object manipulation, that appears in @ShafikYaghmour answer, exposes clearly the problematic and proposes modification of the standard in order to make implementation of vector like container and other law level programming technique easier.

Nevertheless I was wondering if there were no work around to implement a type equivalent to std::vector only using what is provided by the language without any use of the standard library.

The objective is to construct vector elements, one by one, in a raw storage region, and be able to access those elements using an iterator. This would be equivalent to a sequence of push_back on a std::vector.

To get an idea of the problem, bellow a simplification of the operations that are performed on the implementation of std::vector in the libc++ or libstdc++:

void access_value(std::string x);

std::string s1, s2, s3;
//allocation
auto p=static_cast<std::string*>(::operator new(10*sizeof(std::string)));

//push_back s1
new(p) std::string(s1);
access_value(*p);//undefined behavior, p is not a pointer to object

//push_back s2
new(p+1) std::string(s2);//undefined behavior
        //, pointer arithmetic but no array (neither implicit array of size 1)
access_value(*(p+1));//undefined behavior, p+1 is not a pointer to object

//push_back s2
new(p+2) std::string(s3);//undefined behavior
        //, pointer arithmetic but no array
access_value(*(p+2));//undefined behavior, p+2 is not a pointer to object

My idea is to use a union that never initialize its member.

//almost trivialy default constructible
template<class T>
union atdc{
  char _c;
  T value;
  atdc ()noexcept{ }
  ~atdc(){}
};

The raw storage will be initialized with an array of this union type, and pointer arithmetic is always performed on this array. Then elements are constructed on the inactive member of the union at each push_back.

std::string s1, s2, s3;
auto p=::operator new(10*sizeof(std::string));
auto arr = new(p) atdc<std::string>[10];
//pointer arithmetic on arr is allowed

//push_back s1
new(&arr[0].value) std::string(s1); //union member activation
access_value(arr[0].value);

//push_back s2
new(&arr[1].value) std::string(s2);
access_value(arr[1].value);

//push_back s2
new(&arr[2].value) std::string(s2);
access_value(arr[2].value);

Is there any undefined behavior in this code above?

like image 607
Oliv Avatar asked Oct 25 '18 19:10

Oliv


1 Answers

This is topic that is under active discussion, we can see this in the proposal p0593: Implicit creation of objects for low-level object manipulation. This is a pretty solid discussion of the issues and why they are not fixable without changes. If you have different approaches or strong views on the approaches being considered you may want to reach out to the proposals authors.

It includes this discussion:

2.3. Dynamic construction of arrays

Consider this program that attempts to implement a type like std::vector (with many details omitted for brevity):

....

In practice, this code works across a range of existing implementations, but according to the C++ object model, undefined behavior occurs at points #a, #b, #c, #d, and #e, because they attempt to perform pointer arithmetic on a region of allocated storage that does not contain an array object.

At locations #b, #c, and #d, the arithmetic is performed on a char*, and at locations #a, #e, and #f, the arithmetic is performed on a T*. Ideally, a solution to this problem would imbue both calculations with defined behavior.

  1. Approach

The above snippets have a common theme: they attempt to use objects that they never created. Indeed, there is a family of types for which programmers assume they do not need to explicitly create objects. We propose to identify these types, and carefully carve out rules that remove the need to explicitly create such objects, by instead creating them implicitly.

The approach using the adc union has the problem that we expect to be able to access the contained data via a pointer T* i.e. via std::vector::data. Accessing the union as a T* would violate the strict aliasing rules and therefore be undefined behavior.

like image 56
Shafik Yaghmour Avatar answered Sep 28 '22 07:09

Shafik Yaghmour