I've read that creating or copying a std::shared_ptr involves some overhead (atomic increment of reference counter etc..).
But what about creating a std::weak_ptr from it instead:
Obj * obj = new Obj();
// fast
Obj * o = obj;
// slow
std::shared_ptr<Obj> a(o);
// slow
std::shared_ptr<Obj> b(a);
// slow ?
std::weak_ptr<Obj> c(b);
I was hoping in some faster performance, but i know that the shared pointer still have to increment the weak references counter.. So is this still as slow as copying a shared_ptr into another?
This is from my days with game engines
The story goes:
We need a fast shared pointer implementation, one that doesn't thrash the cache (caches are smarter now btw)
A normal pointer:
XXXXXXXXXXXX....
^--pointer to data
Our shared pointer:
iiiiXXXXXXXXXXXXXXXXX...
^ ^---pointer stored in shared pointer
|
+---the start of the allocation, the allocation is sizeof(unsigned int)+sizeof(T)
The unsigned int*
used for the count is at ((unsigned int*)ptr)-1
that way a "shared pointer" is pointer sized,and the data it contains is the pointer to the actual data. So (because template=>
inline and any compiler would inline an operator returning a data member) it was the same "overhead" for access as a normal pointer.
Creation of pointers took like 3 more CPU instructions than normal (access to a location-4 is on operation, the add of 1 and the write to location -4)
Now we'd only use weak-pointers when we were debugging (so we'd compile with DEBUG defined (macro definition)) because then we'd like to see all allocations and whats going on and such. It was useful.
The weak-pointers must know when what they point to is gone, NOT keep the thing they point to alive (in my case, if the weak pointer kept the allocation alive the engine would never get to recycle or free any memory, then it's basically a shared pointer anyway)
So each weak-pointer has a bool, alive
or something, and is a friend of shared_pointer
When debugging our allocation looked like this:
vvvvvvvviiiiXXXXXXXXXXXXX.....
^ ^ ^ the pointer we stored (to the data)
| +that pointer -4 bytes = ref counter
+Initial allocation now
sizeof(linked_list<weak_pointer<T>*>)+sizeof(unsigned int)+sizeof(T)
The linked list structure you use depends on what you care about, we wanted to stay as close to sizeof(T) as we could (we managed memory using the buddy algorithm) so we stored a pointer to the weak_pointer and used the xor trick.... good times.
Anyway: the weak pointers to something shared_pointers point to are put in a list, stored somehow in the "v"s above.
When the reference count hits zero, you go through that list (which is a list of pointers to actual weak_pointers, they remove themselves when deleted obviously) and you set alive=false (or something) to each weak_pointer.
The weak_pointers now know what they point to is no longer there (so threw when de-referenced)
In this example
There is no overhead (the alignment was 4 bytes with the system. 64 bit systems tend to like 8 byte alignments.... union the ref-counter with an int[2] in there to pad it out in that case. Remember this involves inplace news (nobody downvote because I mentioned them :P) and such. You need to make sure the struct
you impose on the allocation matches what you allocated and made. Compilers can align stuff for themselves (hence int[2] not int,int).
You can de-reference the shared_pointer with no overhead at all.
New shared pointers being made do not thrash the cache at all and require 3 CPU instructions, they are not very... pipe-line-able but the compiler will inline getters and setters always (if not probably always :P) and there'll be something around the call-site that can fill the pipeline.
The destructor of a shared pointer also does very little (decrements, that's it) so is great!
High performance note
If you have a situation like:
f() {
shared_pointer<T> ptr;
g(ptr);
}
There's no guarantee that the optimiser will dare to not do the adds and subtractions from passing shared_pointer "by value" to g.
This is where you'd use a normal reference (which is implemented as a pointer)
so you'd do g(ptr.extract_reference());
instead - again the compiler will inline the simple getter.
now you have a T&, because ptr's scope entirely surrounds g (assuming g has no side-effects and so forth) that reference will be valid for the duration of g.
deleting references is very ugly and you probably couldn't do it by accident (we relied on this fact).
In hindsight
I should have created a type called "extracted_pointer" or something, it'd be really hard to type that by mistake for a class member.
The weak/shared pointers used by stdlib++
http://gcc.gnu.org/onlinedocs/libstdc++/manual/shared_ptr.html
Not as fast...
But don't worry about the odd cache miss unless you're making a game engine that isn't running a decent workload > 120fps easily :P Still miles better than Java.
The stdlib way is nicer. Each object has it's own allocation and job. With our shared_pointer
it was a true case of "trust me it works, try not to worry about how" (not that it is hard) because the code looked really messy.
If you undid the ... whatever they've done to the names of variables in their implementation it'd be far easier to read. See Boost's implementation, as it says in that documents.
Other than variable names the GCC stdlib implementation is lovely. You can read it easily, it does it's job properly (following the OO principle) but is a little slower and MAY thrash the cache on crappy chips these days.
UBER high performance note
You may be thinking, why not have XXXX...XXXXiiii
(the reference count at the end) then you'll get the alignment that's best fro the allocator!
Answer:
Because having to do pointer+sizeof(T)
may not be one CPU instruction! (Subtracting 4 or 8 is something a CPU can do easy simply because it makes sense, it'll be doing this a lot)
In addition to Alec's very interesting description of the shared/weak_ptr system used in his previous projects, I wanted to give a little more detail on what is likely to be happening for a typical std::shared_ptr/weak_ptr
implementation:
// slow
std::shared_ptr<Obj> a(o);
The main expense in the above construction is to allocate a block of memory to hold the two reference counts. No atomic operations need be done here (aside from what the implementation may or may not do under operator new
).
// slow
std::shared_ptr<Obj> b(a);
The main expense in the copy construction is typically a single atomic increment.
// slow ?
std::weak_ptr<Obj> c(b);
The main expense in the this weak_ptr
constructor is typically a single atomic increment. I would expect the performance of this constructor to be nearly identical to that of the shared_ptr
copy constructor.
Two other important constructors to be aware of are:
std::shared_ptr<Obj> d(std::move(a)); // shared_ptr(shared_ptr&&);
std::weak_ptr<Obj> e(std::move( c )); // weak_ptr(weak_ptr&&);
(And matching move assignment operators as well)
The move constructors do not require any atomic operations at all. They just copy the reference count from the rhs to the lhs, and make the rhs == nullptr.
The move assignment operators require an atomic decrement only if the lhs != nullptr prior to the assignment. The bulk of the time (e.g. within a vector<shared_ptr<T>>
) the lhs == nullptr prior to a move assignment, and so there are no atomic operations at all.
The latter (the weak_ptr
move members) are not actually C++11, but are being handled by LWG 2315. However I would expect it to already be implemented by most implementations (I know it is already implemented in libc++).
These move members will be used when scooting smart pointers around in containers, e.g. under vector<shared_ptr<T>>::insert/erase
, and can have a measurable positive impact compared to the use of the smart pointer copy members.
I point it out so that you will know that if you have the opportunity to move instead of copy a shared_ptr/weak_ptr
, it is worth the trouble to type the few extra characters to do so.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With