Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are some behavioral anomalies with data structures that use reference counting in STL?

Scott Meyer in "Effective STL" says that one of the things to think about while deciding which data structure to use is whether the container uses reference counting or not. He says that there are some behavioral anomalies with this approach.

What are some of them? Why do containers like 'string' and 'rope' have anomalous behaviors?

like image 394
unj2 Avatar asked Aug 30 '12 13:08

unj2


People also ask

Does C++ use reference counting?

Reference counting is one such technique. This method is simply keeping an extra counter along with each object that is created. The counter is the number of references that exist to the object, in the C/C++ case this would how many pointers refer to this object.

What is a reference counted pointer?

[...] Reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object or block of memory. It is typically used as a means of deallocating objects which are no longer referenced. Wikipedia.

What is reference counter C++?

Reference-count garbage collection is a powerful technique for managing memory that helps prevent objects from being deleted accidentally or more than once. The technique is not limited to C++ code and, despite its name, is unrelated to the C++ concept of reference variables.


3 Answers

As others have said, the typical example is std::string. Besides performance issues with locking in multithreaded programs, there are thread-less problems with reference-counted strings. Imagine this:

string s = "hello";
string t = s;                   // s and t share data
char &c = t[0];                 // copy made here, since t is non-const

The problem is that the non-const operator[] must make a copy of the string if it's shared, since the returned reference could be used later to modify the string (you can assign it to a non-reference char, but operator[] doesn't know it should behave any differently). On the other hand the const operator[] should avoid making a copy, because that would eliminate all benefits of reference counting (it would mean you always make a copy in practice).

const char &get_first(const string &s) {
    return s[0];                // no copy, s is const
}

string s = "hello";
string t = s;                   // s and t share data
const char &c1 = get_first(t);  // no copy made here
const char &c2 = t[0];          // copy made, since t is non-const
// c1 just got invalidated (in fact, it's pointing at s[0], not t[0]).
s[0] = 'X';
printf("%c, %c\n", c1, c2);     // outputs "X, h"

As you can see, this distinction is confusing and may cause really unexpected behavior.

Here's an old article about copy-on-write semantics and its impact on performance: http://www.gotw.ca/gotw/045.htm.

Here's the proposal with motivation to change std::string to not be reference-counted in C++11 standard: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html. This is what the above example is based on.

like image 116
DS. Avatar answered Oct 20 '22 16:10

DS.


As a sample one anomaly that can happen with reference counted strings, in particular strings with "subpart" handling (with a start/end slice), is "unfortunate locking".

Let's imagine you allocate memory for the entire text of a file. You then parse your file and use some "slice()", "left()", "mid()" or equivalent method. You might end locking the entire string for the file whereas maybe only a very small part of it contains actually textual data (the remain being already parsed numbers, punctuation or whatever). You might therefore have used more memory than necessary in the end, while controlling more easily peak usage. There might be a second problem in this case if you use multi-threading and use intensively some of the strings in various threads: unnecessary memory contention, the reference count of the strings might get incremented/decremented all the times and the atomicity might get int the way, slowing done all the threads.

There is nothing against reference counting though, as long as you know the potential issues in your application and prevent them (in this case simply make the strings "alone" by copying them).

like image 1
armel Avatar answered Oct 20 '22 18:10

armel


As a general rule, reference counting suffers from the usual multi-threading issues of race conditions, deadlocks or excessive synchronization to avoid either.

Then you have problems of context which would normally require closure-like behaviour i.e. objects may need to be captured after they have apparently gone out of scope, but this may be avoided by STL, I am not an STL expert.

There's a discussion here which talks about various baroque edge-cases associated with smart pointers: http://www.velocityreviews.com/forums/t689414-c-primer-4th-edition-reference-counting-smart-pointers.html

like image 1
emperorz Avatar answered Oct 20 '22 18:10

emperorz