Yesterday I asked a question about copying objects in C#, and most answers focussed on the difference between deep copy and shallow copy, and the fact that it should be made clear which of both copy variants a given copy constructor (or operator, or function) implements. I find this odd. I wrote a lot of software in C++, a language that heavily relies on copying, and I never ever needed multiple copy variants. The only kind of copy operation I ever used is the one I call "deep enough copy". It does the following: <ul> <li>In case the object has ownership over the member variable (cf. composition), it is copied recursively.</li> <li>In case the object has no ownership over the member variable (cf. aggregation), only the link is copied.</li> </ul> Now, my question is threefold: <ul> <li>1) Does an object ever need more than one copy variant?</li> <li>2) Does a copy function need to make clear which copy variant it implements?</li> <li>3) As an aside, is there a better term for what I call "deep enough copy"? I asked a related question about the definition of the term "deep copy".</li> </ul>

An object only needs to copy what it needs to copy. Though this question is marked language agnostic, and you mentioned C++, I prefer to explain in C# terms (since, that's what I'm most familiar with). However, the concepts are similar. Value types are like structs. They live directly in an object instance. Therefore, when you copy the object, you have no choice but to copy the value type. So, you generally don't have to worry about those. Reference types are like pointers, and this is where it gets tricky. Depending on what the reference type is, you may or may not want a deep copy. A general rule of thumb is that if a reference type (as a member of the object) depends on the state of the outer object, it should be cloned. If not, and never will, it doesn't have to be. Another way of thinking is that an object passed in to your object from the outside probably should NOT be cloned. An object generated BY your class, should be. Okay, I lied, I will use some C++ since it will best explain what I mean. <pre class="prettyprint"><code>class MyClass { int foo; char * bar; char * baz; public: MyClass(int f, char * str) { this->foo = f; bar = new char[f]; this->baz = str; } }; </code></pre> With this object, there are two string buffers that need to be dealt with. The first one, <code>bar</code>, is created and managed by the class itself. When you clone the object, you should allocate a new buffer. <code>baz</code>, on the other hand, should not be. In fact, you can't, since you don't have enough information to do so. The pointer should just be copied. And, of course, <code>foo</code> is just a number. Just copy it, there's nothing else to worry about :) In summary, to answer your questions directly: <ol> <li>99% of the time, no. There's only one way to copy that makes sense. What that way is, however, varies.</li> <li>Not directly. Documenting it is a good idea, but anything internal should stay internal.</li> <li>Just "Deep copy". You should (Edit: ALMOST) never try to clone an object or pointer you don't control, so that's exempt from the rules :)</li> </ol>

The distinction between of "deep copy" versus "shallow copy" makes sense as an implementation detail, but allow it to leak beyond that generally indicates a flawed abstraction which will likely manifest itself in other ways as well. If an object <code>Foo</code> holds an object reference purely for the purpose of encapsulating immutable aspects, other than identity, of the object contained therein, then a correct copy of <code>Foo</code> may either contain a duplicate of the reference or a reference to a duplicate of the encapsulated object. If an object <code>Foo</code> holds an object reference purely for the purpose of encapsulating mutable and immutable aspects of an object other than identity, but no reference to that object will ever be exposed to anything that would mutate it, the same situation applies. If an object <code>Foo</code> holds an object reference purely for the purpose of encapsulating mutable and immutable aspects of an object other than identity, and the object in question is going to be mutated, then a correct copy of <code>Foo</code> must contain a reference to a duplicate of the encapsulated object. If an object <code>Foo</code> holds an object reference purely for the purpose of encapsulating immutable aspects of the object including identity, then a correct copy of <code>Foo</code> must contain a duplicate of the reference; it must NOT contain a reference to a duplicated object. If an object <code>Foo</code> holds an object reference for the purpose of encapsulating both mutable state and object identity, then it is not possible to produce a correct copy of <code>Foo</code> in isolation. A correct copy of <code>Foo</code> may only be produced by duplicating the entire set of objects to which it is attached. The only time it makes sense to talk about a "shallow copy" is when an incomplete operation is used as one of the steps in making a correct copy. Otherwise, there is only one correct copy "depth", controlled by the type of state encapsulated in object references.

Does a copy constructor/operator/function need to make clear which copy variant it implements?

Tags:

language-agnostic

copy

deep-copy

shallow-copy

Yesterday I asked a question about copying objects in C#, and most answers focussed on the difference between deep copy and shallow copy, and the fact that it should be made clear which of both copy variants a given copy constructor (or operator, or function) implements. I find this odd.

I wrote a lot of software in C++, a language that heavily relies on copying, and I never ever needed multiple copy variants. The only kind of copy operation I ever used is the one I call "deep enough copy". It does the following:

In case the object has ownership over the member variable (cf. composition), it is copied recursively.
In case the object has no ownership over the member variable (cf. aggregation), only the link is copied.

Now, my question is threefold:

1) Does an object ever need more than one copy variant?
2) Does a copy function need to make clear which copy variant it implements?
3) As an aside, is there a better term for what I call "deep enough copy"? I asked a related question about the definition of the term "deep copy".

357

asked Jul 28 '10 07:07

Dimitri C.

2 Answers

An object only needs to copy what it needs to copy. Though this question is marked language agnostic, and you mentioned C++, I prefer to explain in C# terms (since, that's what I'm most familiar with). However, the concepts are similar.

Value types are like structs. They live directly in an object instance. Therefore, when you copy the object, you have no choice but to copy the value type. So, you generally don't have to worry about those.

Reference types are like pointers, and this is where it gets tricky. Depending on what the reference type is, you may or may not want a deep copy. A general rule of thumb is that if a reference type (as a member of the object) depends on the state of the outer object, it should be cloned. If not, and never will, it doesn't have to be.

Another way of thinking is that an object passed in to your object from the outside probably should NOT be cloned. An object generated BY your class, should be.

Okay, I lied, I will use some C++ since it will best explain what I mean.

class MyClass {
    int foo;
    char * bar;
    char * baz;

public: MyClass(int f, char * str) {
        this->foo = f;
        bar = new char[f];
        this->baz = str;
    }
};

With this object, there are two string buffers that need to be dealt with. The first one, bar, is created and managed by the class itself. When you clone the object, you should allocate a new buffer.

baz, on the other hand, should not be. In fact, you can't, since you don't have enough information to do so. The pointer should just be copied.

And, of course, foo is just a number. Just copy it, there's nothing else to worry about :)

In summary, to answer your questions directly:

99% of the time, no. There's only one way to copy that makes sense. What that way is, however, varies.
Not directly. Documenting it is a good idea, but anything internal should stay internal.
Just "Deep copy". You should (Edit: ALMOST) never try to clone an object or pointer you don't control, so that's exempt from the rules :)

answered Sep 24 '22 23:09

Mike Caron

The distinction between of "deep copy" versus "shallow copy" makes sense as an implementation detail, but allow it to leak beyond that generally indicates a flawed abstraction which will likely manifest itself in other ways as well.

If an object Foo holds an object reference purely for the purpose of encapsulating immutable aspects, other than identity, of the object contained therein, then a correct copy of Foo may either contain a duplicate of the reference or a reference to a duplicate of the encapsulated object.

If an object Foo holds an object reference purely for the purpose of encapsulating mutable and immutable aspects of an object other than identity, but no reference to that object will ever be exposed to anything that would mutate it, the same situation applies.

If an object Foo holds an object reference purely for the purpose of encapsulating mutable and immutable aspects of an object other than identity, and the object in question is going to be mutated, then a correct copy of Foo must contain a reference to a duplicate of the encapsulated object.

If an object Foo holds an object reference purely for the purpose of encapsulating immutable aspects of the object including identity, then a correct copy of Foo must contain a duplicate of the reference; it must NOT contain a reference to a duplicated object.

If an object Foo holds an object reference for the purpose of encapsulating both mutable state and object identity, then it is not possible to produce a correct copy of Foo in isolation. A correct copy of Foo may only be produced by duplicating the entire set of objects to which it is attached.

The only time it makes sense to talk about a "shallow copy" is when an incomplete operation is used as one of the steps in making a correct copy. Otherwise, there is only one correct copy "depth", controlled by the type of state encapsulated in object references.

answered Sep 22 '22 23:09

supercat

Related questions
                            
                                Why do static Create methods exist?
                            
                                Finding how many bits are on in a number
                            
                                "Necessary" Uses of Recursion in Imperative Languages
                            
                                Would there ever be a reason to write code in pure binary?
                            
                                Why are many developers opposed to using the "protected" modifier in OOP?
                            
                                On K.I.S.S and paving cowpaths [closed]
                            
                                Do formal methods of program verfication have a place in industry?
                            
                                Algorithm for joining e.g. an array of strings
                            
                                Pseudo-random number generator
                            
                                What's the difference between game development and business development?
                            
                                Syntactic sugar vs. feature
                            
                                Elegantly check if a given date is yesterday
                            
                                Algorithm to find points that are furthest apart -- better than O(n^2)?
                            
                                Should inheritance (of non-interface types) be removed from programming languages?
                            
                                Google fuzzy search (a.k.a "suggestions"): What technique(s) are in use?
                            
                                Is there any way to efficiently reconstruct a collection based on a sequence of inserts/removals?
                            
                                optimal negative space between rectangles algorithm?
                            
                                Place To get EULA and Other Legalese For Software? [closed]
                            
                                How to balance number of ratings versus the ratings themselves?
                            
                                Computing x mod y where y is not representable as floating point

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With