Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to clone a HashSet<T>?

A few days ago, I answered an interesting question on SO about HashSet<T>. A possible solution involved cloning the hashset, and in my answer I suggested to do something like this:

HashSet<int> original = ... HashSet<int> clone = new HashSet<int>(original); 

Although this approach is quite straightforward, I suspect it's very inefficient: the constructor of the new HashSet<T> needs to separately add each item from the original hashset, and check if it isn't already present. This is clearly a waste of time: since the source collection is a ISet<T>, it is guaranteed not to contain duplicates. There should be a way to take advantage of that knowledge...

Ideally, HashSet<T> should implement ICloneable, but unfortunately it's not the case. I also checked with Reflector to see if the HashSet<T> constructor did something specific if the source collection was a hashset, but it doesn't. It could probably be done by using reflection on private fields, but that would be an ugly hack...

So, did someone come up with a clever solution to clone a hashset more efficiently ?

(Note that this question is purely theoretical, I don't need to do that in a real program)

like image 792
Thomas Levesque Avatar asked Oct 13 '10 20:10

Thomas Levesque


People also ask

Can a HashSet be cloned?

HashSet clone() Method in Javaclone() method is used to return a shallow copy of the mentioned hash set. It just creates a copy of the set. Parameters: The method does not take any parameters. Return Value: The method just returns a copy of the HashSet.

How do I copy a set to another?

One way of copying a Set is to use the copy constructor of a Set implementation: Set<T> copy = new HashSet<>(original); A copy constructor is a special type of constructor that is used to create a new object by copying an existing object.

Why should you use HashSet Why does it store unique values?

Simple summary: HashSet is to store a series of unique values. Advantages: Represents a set of values and provides high-performance operations. This is a set of collections that do not contain duplicate elements, and the stored elements do not have a specific order.


2 Answers

If you really wanted the most efficient way to clone a HashSet<T>, you'd do the following (but possibly at the cost of maintainability)

  1. Use reflector or the debugger to figure out exactly what fields in HashSet<T> need to be copied. You may need to do this recursively for each field.
  2. Use Reflection.Emit or use expression trees to generate a method which does the necessary copying of all of the fields. May need to call other generated methods which copy the value of each field. We're using runtime code generation because it's the only way to directly access private fields.
  3. Use FormatterServices.GetUninitializedObject(...) to instantiate a blank object. Use the method generated in step 2 to copy the original object to the new blank object.
like image 93
jthg Avatar answered Oct 08 '22 12:10

jthg


I checked the .NET Framework source code for both version 4.5.2 and version 4.7.2. Version 4.7.2 does have optimization in the constructor to handle when the passed in collection is of type HashSet, using some internal cloning logic. You would need to also pass in the comparer into the constructor for this logic to work. Version 4.5.2 does NOT have this optimization it seems.

Example:

var clonedSet = new HashSet(set, set.Comparer); 
like image 30
Mafu Josh Avatar answered Oct 08 '22 10:10

Mafu Josh