Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subtract HashSets (and return a copy)?

Tags:

c#

hashset

I've got a HashSet,

var universe = new HashSet<int>();

And a bunch of subsets,

var sets = new List<HashSet<int>>(numSets);

I want to subtract a chunk, which I can do like this:

var remaining = universe.ExceptWith(sets[0]);

But ExceptWith works in-place. I don't want to modify the universe. Should I clone it first, or is there a better way?

like image 677
mpen Avatar asked Oct 09 '10 19:10

mpen


People also ask

What are HashSets?

HashSet is a class that extends AbstractSet and implements the Set interface in Java. It is a very useful tool that allows you to store unique items and access them in constant time (on average). No duplicate values are stored.

What is a HashSet used for?

A HashSet is usually used for high-performance operations involving a set of unique data. Since HashSet contains only unique elements, its internal structure is optimized for faster searches. Note that you can store a single null value in a HashSet.

How do you make a shallow copy of a HashSet?

util. HashSet. clone() method is used to return a shallow copy of the mentioned hash set. It just creates a copy of the set.

How to define a HashSet in Java?

Creating a HashSet Once we import the package, here is how we can create hash sets in Java. // HashSet with 8 capacity and 0.75 load factor HashSet<Integer> numbers = new HashSet<>(8, 0.75); Here, we have created a hash set named numbers . Notice, the part new HashSet<>(8, 0.75) .


4 Answers

I guess I should clone it first? How do I do that?

var universe = new HashSet<int>();
var subset = new HashSet<int>();
...

// clone the universe
var remaining = new HashSet<int>(universe);
remaining.ExceptWith(subset);

Not as simple as with the Except extension method, but probably faster (you should run a few performance tests to make sure)

like image 147
Thomas Levesque Avatar answered Oct 27 '22 11:10

Thomas Levesque


How about Except()?

var x = new HashSet<int>();
var y = new HashSet<int>();

var xminusy = new HashSet<int>(x.Except(y));
like image 42
James McNellis Avatar answered Oct 27 '22 12:10

James McNellis


I benchmarked Linq's Except method against cloning and using the HashSet-native function ExceptWith. Here are the results.

static class Program
{
    public static HashSet<T> ToSet<T>(this IEnumerable<T> collection)
    {
        return new HashSet<T>(collection);
    }

    public static HashSet<T> Subtract<T>(this HashSet<T> set, IEnumerable<T> other)
    {
        var clone = set.ToSet();
        clone.ExceptWith(other);
        return clone;
    }

    static void Main(string[] args)
    {
        var A = new HashSet<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
        var B = new HashSet<int> { 2, 4, 6, 8, 10 };
        var sw = new Stopwatch();

        sw.Restart();
        for (int i = 0; i < 1000000; ++i)
        {
            var C = A.Except(B).ToSet();
        }
        sw.Stop();
        Console.WriteLine("Linq: {0} ms", sw.ElapsedMilliseconds);

        sw.Restart();
        for (int i = 0; i < 1000000; ++i)
        {
            var C = A.Subtract(B);
        }
        sw.Stop();
        Console.WriteLine("Native: {0} ms", sw.ElapsedMilliseconds);

        Console.ReadLine();
    }
}

Linq: 1297 ms
Native: 762 ms

http://programanddesign.com/cs/subtracting-sets/

like image 23
mpen Avatar answered Oct 27 '22 12:10

mpen


A hash set has to track its hash algorithm constants, and its overflow bins. The elements in the set are held by reference. Creating a new hash with the copy constructor, as Thomas Levesque suggests, creates a shallow copy of this overhead and should be quite fast. Using Except() in the way that James McNellis suggests first creates an anonymous copy and then passes that to the copy constructor which uses the fields in the anonymous to initialize its own fields. As Thomas said, you might do a few performance tests, but theoretically his answer should beat James' answer. And by the way, to my way of thinking, a shallow copy is not a clone since I believe a clone implies that the underlying elements are also copied. Hash sets with common elements use a copy when modified strategy.

like image 1
verisimilidude Avatar answered Oct 27 '22 10:10

verisimilidude