Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why the libc++ std::vector internally keeps three pointers instead of one pointer and two sizes?

Tags:

I'm looking at the implementation of std::vector in libc++ and I noticed that it internally keeps three pointers (one to the begin, one the end, and one to the end of the allocated memory) instead of what I'd instinctively do, i.e., one pointer to the begin and two size and capacity members.

Here is the code from libc++'s <vector> (ignore the compressed pair, I know what it means).

pointer                                    __begin_; pointer                                    __end_; __compressed_pair<pointer, allocator_type> __end_cap_; 

I noticed that also other standard libraries do the same (e.g. Visual C++). I don't see any particular reason why this solution should be faster than the other one, but I might be wrong.

So is there a particular reason the "three pointers" solution is preferred to the "pointer + sizes" one?

like image 499
gigabytes Avatar asked May 24 '15 09:05

gigabytes


People also ask

Can vector store pointers?

You can store pointers in a vector just like you would anything else. Declare a vector of pointers like this: vector<MyClass*> vec; The important thing to remember is that a vector stores values without regard for what those values represent.

Is STD vector a pointer?

std::vector::dataReturns a direct pointer to the memory array used internally by the vector to store its owned elements.

Do I need to delete pointer vectors?

Yes, the code has a memory leak unless you delete the pointers. If the foo class owns the pointers, it is its responsibility to delete them. You should do this before clearing the vector, otherwise you lose the handle to the memory you need to de-allocate.

How do you define a pointer vector?

A vector of pointers is similar to a vector of objects. The main differences are as follows: The values of the vector of pointers, have to be addresses of objects declared from or instantiated from the class. Assume that the class name is TheCla, then the template argument of the vector has to be “TheCla*”.


2 Answers

It's because the rationale is that performance should be optimized for iterators, not indices.
(In other words, performance should be optimized for begin()/end(), not size()/operator[].)
Why? Because iterators are generalized pointers, and thus C++ encourages their use, and in return ensures that their performance matches those of raw pointers when the two are equivalent.

To see why it's a performance issue, notice that the typical for loop is as follows:

for (It i = items.begin(); i != items.end(); ++i)     ... 

Except in the most trivial cases, if we kept track of sizes instead of pointers, what would happen is that the comparison i != items.end() would turn into i != items.begin() + items.size(), taking more instructions than you'd expect. (The optimizer generally has a hard time factoring out the code in many cases.) This slows things down dramatically in a tight loop, and hence this design is avoided.

(I've verified this is a performance problem when trying to write my own replacement for std::vector.)


Edit: As Yakk pointed out in the comments, using indices instead of pointers can also result in the generation of a multiplication instruction when the element sizes aren't powers of 2, which is pretty expensive and noticeable in a tight loop. I didn't think of this when writing this answer, but it's a phenomenon that's bitten me before (e.g. see here)... bottom line is, in a tight loop everything matters.

like image 150
user541686 Avatar answered Oct 21 '22 17:10

user541686


It's more convenient for implementers.

Storing size makes exactly one operation easier to implement: size()

size_t size() { return size_; } 

on the other hand, it makes other harder to write and makes reusing code harder:

iterator end() { return iterator(end_); } // range version iterator end() { return iterator(begin_ + size_); } // pointer + size version  void push_back(const T& v) // range version {     // assume only the case where there is enough capacity     ::new(static_cast<void*>(end_)) T(v);     ++end_; }  void push_back(const T& v) // pointer + size version {     // assume only the case where there is enough capacity     ::new(static_cast<void*>(begin_ + size_)) T(v);     // it could use some internal `get_end` function, but the point stil stands:     // we need to get to the end     ++size_; } 

If we have to find the end anyway, we could store it directly - it's more useful than size anyway.

like image 31
milleniumbug Avatar answered Oct 21 '22 17:10

milleniumbug