I wonder what is the fastest way to do shallow copying in C#? I only know there are 2 ways to do shallow copy: <ol> <li>MemberwiseClone</li> <li>Copy each field one by one (manual)</li> </ol> I found that (2) is faster than (1). I'm wondering if there's another way to do shallow copying?

This is a complex subject with lots of possible solutions and many pros and cons to each. There is a wonderful article here that outlines several different ways of making a copy in C#. To summarize: <ol> <li>Clone Manually Tedious, but high level of control.</li> <li>Clone with MemberwiseClone Only creates a shallow copy, i.e. for reference-type fields the original object and its clone refer to the same object.</li> <li>Clone with Reflection Shallow copy by default, can be re-written to do deep copy. Advantage: automated. Disadvantage: reflection is slow.</li> <li>Clone with Serialization Easy, automated. Give up some control and serialization is slowest of all.</li> <li>Clone with IL, Clone with Extension Methods More advanced solutions, not as common.</li> </ol>

I'd like to start with a few quotes: <blockquote> In fact, MemberwiseClone is usually much better than others, especially for complex type. </blockquote> and <blockquote> I'm confused. MemberwiseClone() should annihilate the performance of anything else for a shallow copy. [...] </blockquote> Theoretically the best implementation of a shallow copy is a C++ copy constructor: it knows the size compile-time, and then does a memberwise clone of all fields. The next best thing is using <code>memcpy</code> or something similar, which is basically how <code>MemberwiseClone</code> should work. This means, in theory it should obliterate all other possibilities in terms of performance. Right? ... but apparently it isn't blazing fast and it doesn't obliterate all the other solutions. At the bottom I've actually posted a solution that's over 2x faster. So: Wrong. Testing the internals of MemberwiseClone Let's start with a little test using a simple blittable type to check the underlying assumptions here about performance: <pre class="prettyprint"><code>[StructLayout(LayoutKind.Sequential)] public class ShallowCloneTest { public int Foo; public long Bar; public ShallowCloneTest Clone() { return (ShallowCloneTest)base.MemberwiseClone(); } } </code></pre> The test is devised in such a way that we can check the performance of <code>MemberwiseClone</code> agaist raw <code>memcpy</code>, which is possible because this is a blittable type. To test by yourself, compile with unsafe code, disable the JIT suppression, compile release mode and test away. I've also put the timings after every line that's relevant. Implementation 1: <pre class="prettyprint"><code>ShallowCloneTest t1 = new ShallowCloneTest() { Bar = 1, Foo = 2 }; Stopwatch sw = Stopwatch.StartNew(); int total = 0; for (int i = 0; i < 10000000; ++i) { var cloned = t1.Clone(); // 0.40s total += cloned.Foo; } Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds); </code></pre> Basically I ran these tests a number of times, checked the assembly output to ensure that the thing wasn't optimized away, etc. The end result is that I know approximately how much seconds this one line of code costs, which is 0.40s on my PC. This is our baseline using <code>MemberwiseClone</code>. Implementation 2: <pre class="prettyprint"><code>sw = Stopwatch.StartNew(); total = 0; uint bytes = (uint)Marshal.SizeOf(t1.GetType()); GCHandle handle1 = GCHandle.Alloc(t1, GCHandleType.Pinned); IntPtr ptr1 = handle1.AddrOfPinnedObject(); for (int i = 0; i < 10000000; ++i) { ShallowCloneTest t2 = new ShallowCloneTest(); // 0.03s GCHandle handle2 = GCHandle.Alloc(t2, GCHandleType.Pinned); // 0.75s (+ 'Free' call) IntPtr ptr2 = handle2.AddrOfPinnedObject(); // 0.06s memcpy(ptr2, ptr1, new UIntPtr(bytes)); // 0.17s handle2.Free(); total += t2.Foo; } handle1.Free(); Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds); </code></pre> If you look closely at these numbers, you'll notice a few things: <ul> <li>Creating an object and copying it will take roughly 0.20s. Under normal circumstances this is the fastest possible code you can have.</li> <li>However, to do that, you need to pin and unpin the object. That will take you 0.81 seconds.</li> </ul> So why is all of this so slow? My explanation is that it has to do with the GC. Basically the implementations cannot rely on the fact that memory will stay the same before and after a full GC (The address of the memory can be changed during a GC, which can happen at any moment, including during your shallow copy). This means you only have 2 possible options: <ol> <li>Pinning the data and doing a copy. Note that <code>GCHandle.Alloc</code> is just one of the ways to do this, it's well known that things like C++/CLI will give you better performance. </li> <li>Enumerating the fields. This will ensure that between GC collects you don't need to do anything fancy, and during GC collects you can use the GC ability to modify the addresses on the stack of moved objects.</li> </ol> <code>MemberwiseClone</code> will use method 1, which means you'll get a performance hit because of the pinning procedure. A (much) faster implementation In all cases our unmanaged code cannot make assumptions about the size of the types and it has to pin data. Making assumptions about size enables the compiler to do better optimizations, like loop unrolling, register allocation, etc. (just like a C++ copy ctor is faster than <code>memcpy</code>). Not having to pin data means we don't get an extra performance hit. Since .NET JIT's to assembler, in theory this means that we should be able to make a faster implementation using simple IL emitting, and allowing the compiler to optimize it. So to summarize on why this can be faster than the native implementation? <ol> <li>It doesn't require the object to be pinned; objects that are moving around are handled by the GC -- and really, this is relentlessly optimized.</li> <li>It can make assumptions about the size of the structure to copy, and therefore allows for better register allocation, loop unrolling, etc.</li> </ol> What we're aiming for is the performance of raw <code>memcpy</code> or better: 0.17s. To do that, we basically cannot use more than just a <code>call</code>, create the object, and perform a bunch of <code>copy</code> instructions. It looks a bit like the <code>Cloner</code> implementation above, but some important differences (most significant: no <code>Dictionary</code> and no redundant <code>CreateDelegate</code> calls). Here goes: <pre class="prettyprint"><code>public static class Cloner<T> { private static Func<T, T> cloner = CreateCloner(); private static Func<T, T> CreateCloner() { var cloneMethod = new DynamicMethod("CloneImplementation", typeof(T), new Type[] { typeof(T) }, true); var defaultCtor = typeof(T).GetConstructor(new Type[] { }); var generator = cloneMethod .GetILGenerator(); var loc1 = generator.DeclareLocal(typeof(T)); generator.Emit(OpCodes.Newobj, defaultCtor); generator.Emit(OpCodes.Stloc, loc1); foreach (var field in typeof(T).GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic)) { generator.Emit(OpCodes.Ldloc, loc1); generator.Emit(OpCodes.Ldarg_0); generator.Emit(OpCodes.Ldfld, field); generator.Emit(OpCodes.Stfld, field); } generator.Emit(OpCodes.Ldloc, loc1); generator.Emit(OpCodes.Ret); return ((Func<T, T>)cloneMethod.CreateDelegate(typeof(Func<T, T>))); } public static T Clone(T myObject) { return cloner(myObject); } } </code></pre> I've tested this code with the result: 0.16s. This means it's approximately 2.5x faster than <code>MemberwiseClone</code>. More importantly, this speed is on-par with <code>memcpy</code>, which is more or less the 'optimal solution under normal circumstances'. Personally, I think this is the fastest solution - and the best part is: if the .NET runtime will get faster (proper support for SSE instructions etc), so will this solution. Editorial Note: The sample code above assumes that the default constructor is public. If it is not, the call to <code>GetConstructor</code> returns null. In that case, use one of the other <code>GetConstructor</code> signatures to obtain protected or private constructors. See https://docs.microsoft.com/en-us/dotnet/api/system.type.getconstructor?view=netframework-4.8

Fastest Way to do Shallow Copy in C#

Video Answer

2 Answers

This is a complex subject with lots of possible solutions and many pros and cons to each. There is a wonderful article here that outlines several different ways of making a copy in C#. To summarize:

Clone Manually
Tedious, but high level of control.
Clone with MemberwiseClone
Only creates a shallow copy, i.e. for reference-type fields the original object and its clone refer to the same object.
Clone with Reflection
Shallow copy by default, can be re-written to do deep copy. Advantage: automated. Disadvantage: reflection is slow.
Clone with Serialization
Easy, automated. Give up some control and serialization is slowest of all.
Clone with IL, Clone with Extension Methods
More advanced solutions, not as common.

150

answered Sep 29 '22 18:09

Nick Stamas

I'd like to start with a few quotes:

In fact, MemberwiseClone is usually much better than others, especially for complex type.

and

I'm confused. MemberwiseClone() should annihilate the performance of anything else for a shallow copy. [...]

Theoretically the best implementation of a shallow copy is a C++ copy constructor: it knows the size compile-time, and then does a memberwise clone of all fields. The next best thing is using memcpy or something similar, which is basically how MemberwiseClone should work. This means, in theory it should obliterate all other possibilities in terms of performance. Right?

... but apparently it isn't blazing fast and it doesn't obliterate all the other solutions. At the bottom I've actually posted a solution that's over 2x faster. So: Wrong.

Testing the internals of MemberwiseClone

Let's start with a little test using a simple blittable type to check the underlying assumptions here about performance:

[StructLayout(LayoutKind.Sequential)] public class ShallowCloneTest {     public int Foo;     public long Bar;      public ShallowCloneTest Clone()     {         return (ShallowCloneTest)base.MemberwiseClone();     } }

The test is devised in such a way that we can check the performance of MemberwiseClone agaist raw memcpy, which is possible because this is a blittable type.

To test by yourself, compile with unsafe code, disable the JIT suppression, compile release mode and test away. I've also put the timings after every line that's relevant.

Implementation 1:

ShallowCloneTest t1 = new ShallowCloneTest() { Bar = 1, Foo = 2 }; Stopwatch sw = Stopwatch.StartNew(); int total = 0; for (int i = 0; i < 10000000; ++i) {     var cloned = t1.Clone();                                    // 0.40s     total += cloned.Foo; }  Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds);

Basically I ran these tests a number of times, checked the assembly output to ensure that the thing wasn't optimized away, etc. The end result is that I know approximately how much seconds this one line of code costs, which is 0.40s on my PC. This is our baseline using MemberwiseClone.

Implementation 2:

sw = Stopwatch.StartNew();  total = 0; uint bytes = (uint)Marshal.SizeOf(t1.GetType()); GCHandle handle1 = GCHandle.Alloc(t1, GCHandleType.Pinned); IntPtr ptr1 = handle1.AddrOfPinnedObject();  for (int i = 0; i < 10000000; ++i) {     ShallowCloneTest t2 = new ShallowCloneTest();               // 0.03s     GCHandle handle2 = GCHandle.Alloc(t2, GCHandleType.Pinned); // 0.75s (+ 'Free' call)     IntPtr ptr2 = handle2.AddrOfPinnedObject();                 // 0.06s     memcpy(ptr2, ptr1, new UIntPtr(bytes));                     // 0.17s     handle2.Free();      total += t2.Foo; }  handle1.Free(); Console.WriteLine("Took {0:0.00}s", sw.Elapsed.TotalSeconds);

If you look closely at these numbers, you'll notice a few things:

Creating an object and copying it will take roughly 0.20s. Under normal circumstances this is the fastest possible code you can have.
However, to do that, you need to pin and unpin the object. That will take you 0.81 seconds.

So why is all of this so slow?

My explanation is that it has to do with the GC. Basically the implementations cannot rely on the fact that memory will stay the same before and after a full GC (The address of the memory can be changed during a GC, which can happen at any moment, including during your shallow copy). This means you only have 2 possible options:

Pinning the data and doing a copy. Note that GCHandle.Alloc is just one of the ways to do this, it's well known that things like C++/CLI will give you better performance.
Enumerating the fields. This will ensure that between GC collects you don't need to do anything fancy, and during GC collects you can use the GC ability to modify the addresses on the stack of moved objects.

MemberwiseClone will use method 1, which means you'll get a performance hit because of the pinning procedure.

A (much) faster implementation

In all cases our unmanaged code cannot make assumptions about the size of the types and it has to pin data. Making assumptions about size enables the compiler to do better optimizations, like loop unrolling, register allocation, etc. (just like a C++ copy ctor is faster than memcpy). Not having to pin data means we don't get an extra performance hit. Since .NET JIT's to assembler, in theory this means that we should be able to make a faster implementation using simple IL emitting, and allowing the compiler to optimize it.

So to summarize on why this can be faster than the native implementation?

It doesn't require the object to be pinned; objects that are moving around are handled by the GC -- and really, this is relentlessly optimized.
It can make assumptions about the size of the structure to copy, and therefore allows for better register allocation, loop unrolling, etc.

What we're aiming for is the performance of raw memcpy or better: 0.17s.

To do that, we basically cannot use more than just a call, create the object, and perform a bunch of copy instructions. It looks a bit like the Cloner implementation above, but some important differences (most significant: no Dictionary and no redundant CreateDelegate calls). Here goes:

public static class Cloner<T> {     private static Func<T, T> cloner = CreateCloner();      private static Func<T, T> CreateCloner()     {         var cloneMethod = new DynamicMethod("CloneImplementation", typeof(T), new Type[] { typeof(T) }, true);         var defaultCtor = typeof(T).GetConstructor(new Type[] { });          var generator = cloneMethod .GetILGenerator();          var loc1 = generator.DeclareLocal(typeof(T));          generator.Emit(OpCodes.Newobj, defaultCtor);         generator.Emit(OpCodes.Stloc, loc1);          foreach (var field in typeof(T).GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))         {             generator.Emit(OpCodes.Ldloc, loc1);             generator.Emit(OpCodes.Ldarg_0);             generator.Emit(OpCodes.Ldfld, field);             generator.Emit(OpCodes.Stfld, field);         }          generator.Emit(OpCodes.Ldloc, loc1);         generator.Emit(OpCodes.Ret);          return ((Func<T, T>)cloneMethod.CreateDelegate(typeof(Func<T, T>)));     }      public static T Clone(T myObject)     {         return cloner(myObject);     } }

I've tested this code with the result: 0.16s. This means it's approximately 2.5x faster than MemberwiseClone.

More importantly, this speed is on-par with memcpy, which is more or less the 'optimal solution under normal circumstances'.

Personally, I think this is the fastest solution - and the best part is: if the .NET runtime will get faster (proper support for SSE instructions etc), so will this solution.

Editorial Note: The sample code above assumes that the default constructor is public. If it is not, the call to GetConstructor returns null. In that case, use one of the other GetConstructor signatures to obtain protected or private constructors. See https://docs.microsoft.com/en-us/dotnet/api/system.type.getconstructor?view=netframework-4.8

answered Sep 29 '22 18:09

atlaste

Related questions
                            
                                DataSet does not support System.Nullable<> in Export
                            
                                What is "string[] args" in Main class for?
                            
                                How to do DI in asp.net core middleware?
                            
                                Remotely connect to .net core self hosted web api
                            
                                call a function for each value in a generic c# collection [duplicate]
                            
                                Add a package with a local package file in 'dotnet'
                            
                                Reading embedded XML file c#
                            
                                WPF Binding UI events to commands in ViewModel
                            
                                Does { } act like ( ) when creating a new object in C#?
                            
                                Need to perform Wildcard (*,?, etc) search on a string using Regex
                            
                                How to get the current product version in C#?
                            
                                MVC Controller return a bad request?
                            
                                Currency format for display
                            
                                Getting controller name from razor
                            
                                How to hide an inherited property in a class without modifying the inherited class (base class)?
                            
                                Is there any benefit of using an Object Initializer?
                            
                                Overriding vs method hiding [duplicate]
                            
                                Browse for a directory in C#
                            
                                windows form .. console.writeline() where is console?
                            
                                Use querystring variables in MVC controller

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fastest Way to do Shallow Copy in C#

Tags:

c#

shallow-copy

cloning

tep

People also ask

Video Answer

2 Answers

Nick Stamas

atlaste

Recent Activity

Donate For Us