Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confusion about Copy-On-Write and shared_ptr

I have searched the web and read through the Boost documentation about shared_ptr. There is a response on SO that says that shared_ptr for Copy-On-Write (COW) sucks and that TR! has removed it from the string libraries. Most advice on SO says to use shared_ptr rather than regular pointers.

The documentation also talks about using std::unique() to make a COW pointer, but I haven't found any examples.

Is the talk about having a smart pointer that performs COW for you or about having your object use a new shared_ptr to a cloned object then modifying the cloned object?

Example: Recipes & Ingredients

struct Nutrients;

struct Ingredient
{
    Ingredient(const std::string& new_title = std::string(""))
        : m_title(new_title)
        { ; }
    std::string m_title;
    Nutrients   ing_nutrients;
};

struct Milk : public Ingredient
    : Ingredient("milk")
{ ; }

struct Cream : public Ingredient
    : Ingredient("cream")
{ ; }

struct Recipe
{
    std::vector< boost::shared_ptr<Ingredient> > m_ingredients;
    void append_ingredient(boost::shared_ptr<Ingredient> new_ingredient)
    {
        m_ingredients.push_back(new_ingredient);
        return;
    }
    void replace_ingredient(const std::string& original_ingredient_title,
                            boost::shared_ptr<Ingredient> new_ingredient)
    {
        // Confusion here
    }
};

int main(void)
{
    // Create an oatmeal recipe that contains milk.
    Recipe  oatmeal;
    boost::shared_ptr<Ingredient> p_milk(new Milk);
    oatmeal.add_ingredient(p_milk);

    // Create a mashed potatoes recipe that contains milk
    Recipe  mashed_potatoes;
    mashed_potatoes.add_ingredient(p_milk);

    // Now replace the Milk in the oatmeal with cream
    // This must not affect the mashed_potatoes recipe.
    boost::shared_ptr<Ingredient> p_cream(new Cream);
    oatmeal.replace(p_milk->m_title, p_cream);

    return 0;
}

The confusion is how to replace the 'Milk' in the oatmeal recipe with Cream and not affect the mashed_potatoes recipe.

My algorithm is:

locate pointer to `Milk` ingredient in the vector.
erase it.
append `Cream` ingredient to vector.

How would a COW pointer come into play here?

Note: I am using MS Visual Studio 2010 on Windows NT, Vista and 7.

like image 413
Thomas Matthews Avatar asked Jun 05 '11 19:06

Thomas Matthews


2 Answers

There are several questions bundled into one here, so bear with me if I don't address them in the order you would expect.

Most advice on SO says to use shared_ptr rather than regular pointers.

Yes and No. A number of users of SO, unfortunately, recommend shared_ptr as if it were a silver bullet to solve all memory management related issues. It is not. Most advice talk about not using naked pointers, which is substantially different.

The real advice is to use smart managers: whether smart pointers (unique_ptr, scoped_ptr, shared_ptr, auto_ptr), smart containers (ptr_vector, ptr_map) or custom solutions for hard problems (based on Boost.MultiIndex, using intrusive counters, etc...).

You should pick the smart manager to use depending on the need. Most notable, if you do not need to share the ownership of an object, then you should not use a shared_ptr.

What is COW ?

COW (Copy-On-Write) is about sharing data to "save" memory and make copy cheaper... without altering the semantic of the program.

From a user point of view, whether std::string use COW or not does not matter. When a string is modified, all other strings are unaffected.

The idea behind COW is that:

  • if you are the sole owner of the data, you may modify it
  • if you are not, then you shall copy it, and then use the copy instead

It seems similar to shared_ptr, so why not ?

It is similar, but both are meant to solve different problems, and as a result they are subtly different.

The trouble is that since shared_ptr is meant to function seamlessly whether or not the ownership is shared, it is difficult for COW to implement the "if sole owner" test. Notably, the interaction of weak_ptr makes it difficult.

It is possible, obviously. The key is not to leak the shared_ptr, at all, and not to use weak_ptr (they are useless for COW anyway).

Does it matter ?

No, not really. It's been proved that COW is not that great anyway. Most of the times it's a micro optimization... and a micro pessimization at once. You may spare some memory (though it only works if you don't copy large objects), but you are complicating the algorithm, which may slow down the execution (you are introducing tests).

My advice would be not to use COW. And not to use those shared_ptr either.


Personnally, I would either:

  • use boost::ptr_vector<Ingredient> rather than std::vector< boost::shared_ptr<Ingredient> > (you do not need sharing)
  • create a IngredientFactory, that would create (and manage) the ingredients, and return a Ingredient const&, the Factory should outlive any Receipt.

EDIT: following Xeo's comment, it seems the last item (IngredientFactory) is quite laconic...

In the case of the IngredientFactory, the Receipt object will contain a std::vector<Ingredient const*>. Note the raw pointer:

  • Receipt is not responsible for the memory, but is given access to it
  • there is an implicit warranty that the object pointed to will remain valid longer than the Receipt object

It is fine to use raw (naked) pointers, as long as you treat them like you would a reference. You just have to beware of potential nullity, and you're offered the ability to reseat them if you so wish -- and you trust the provider to take care of the lifetime / memory management aspects.

like image 133
Matthieu M. Avatar answered Sep 20 '22 08:09

Matthieu M.


You have nothing to worry about. Each Recipe object has its own vector, so modifying one won't affect the other, even though both of them happen to contain pointers to the same objects. The mashed-potatoes recipe would only be affected if you changed the contents of the object that p_milk points at, but you're not doing that. You're modifying the oatmeal.m_ingredients object, which has absolutely no relation to mashed_potatoes.m_ingredients. They're two completely independent vector instances.

like image 25
Rob Kennedy Avatar answered Sep 22 '22 08:09

Rob Kennedy