Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent HashSet<T> in .NET Framework?

People also ask

What is concurrent HashSet?

ConcurrentHashSet can be created by using ConcurrentHashMap as it allows us to use keySet(default value) and newKeySet() methods to return the Set, which happens to be a proper Set. This gives us access to necessary functions like contains(), remove(), etc.

What is concurrent collection in C#?

Concurrent namespace. This has several collection classes that are thread-safe and scalable. These collections are called concurrent collections because they can be accessed by multiple threads at a time.

Is HashSet thread-safe?

Thread Safe HashSet Using ConcurrentHashMap Factory Method Basically, this method returns an instance that respects the java. util. Set interface and allows the usage of standard methods like add(), contains(), etc.

Is AddRange thread-safe?

No it is not thread-safe. Thread A could call AddRange on your list. It could iterate partially over the collection and switch threads. Thread B could call Add/Remove, etc.


Your implementation is correct. The .NET Framework does not provide a built-in concurrent hashset type, unfortunately. However, there are some workarounds.

ConcurrentDictionary (recommended)

This first one is to use the class ConcurrentDictionary<TKey, TValue> in the namespace System.Collections.Concurrent. In the case, the value is pointless, so we can use a simple byte (1 byte in memory).

private ConcurrentDictionary<string, byte> _data;

This is the recommended option because the type is thread-safe and provide you the same advantages than a HashSet<T> except key and value are different objects.

Source: Social MSDN

ConcurrentBag

If you don't mind about the duplicate entries, you can use the class ConcurrentBag<T> in the same namespace of the previous class.

private ConcurrentBag<string> _data;

Self-implementation

Finally, as you did, you can implement your own data type, using lock or other ways that the .NET provides you to be thread-safe. Here is a great example: How to implement ConcurrentHashSet in .Net

The only drawback of this solution is that the type HashSet<T> doesn't officially concurrent access, even for reading operations.

I quote the code of the linked post (originally written by Ben Mosher).

using System;
using System.Collections.Generic;
using System.Threading;

namespace BlahBlah.Utilities
{
    public class ConcurrentHashSet<T> : IDisposable
    {
        private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
        private readonly HashSet<T> _hashSet = new HashSet<T>();

        #region Implementation of ICollection<T> ...ish
        public bool Add(T item)
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Add(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public void Clear()
        {
            _lock.EnterWriteLock();
            try
            {
                _hashSet.Clear();
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public bool Contains(T item)
        {
            _lock.EnterReadLock();
            try
            {
                return _hashSet.Contains(item);
            }
            finally
            {
                if (_lock.IsReadLockHeld) _lock.ExitReadLock();
            }
        }

        public bool Remove(T item)
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Remove(item);
            }
            finally
            {
                if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }
        }

        public int Count
        {
            get
            {
                _lock.EnterReadLock();
                try
                {
                    return _hashSet.Count;
                }
                finally
                {
                    if (_lock.IsReadLockHeld) _lock.ExitReadLock();
                }
            }
        }
        #endregion

        #region Dispose
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }
        protected virtual void Dispose(bool disposing)
        {
            if (disposing)
                if (_lock != null)
                    _lock.Dispose();
        }
        ~ConcurrentHashSet()
        {
            Dispose(false);
        }
        #endregion
    }
}

EDIT: Move the entrance lock methods ouside the try blocks, as they could throw an exception and execute the instructions contained in the finally blocks.


Instead of wrapping a ConcurrentDictionary or locking over a HashSet I created an actual ConcurrentHashSet based on ConcurrentDictionary.

This implementation supports basic operations per item without HashSet's set operations as they make less sense in concurrent scenarios IMO:

var concurrentHashSet = new ConcurrentHashSet<string>(
    new[]
    {
        "hamster",
        "HAMster",
        "bar",
    },
    StringComparer.OrdinalIgnoreCase);

concurrentHashSet.TryRemove("foo");

if (concurrentHashSet.Contains("BAR"))
{
    Console.WriteLine(concurrentHashSet.Count);
}

Output: 2

You can get it from NuGet here and see the source on GitHub here.


Since noone else mentioned it, I will offer an alternative approach that may or may not be appropriate for your particular purpose:

Microsoft Immutable Collections

From a blog post by the MS team behind:

While creating and running concurrently is easier than ever, one of the fundamental problems still exists: mutable shared state. Reading from multiple threads is typically very easy, but once the state needs to be updated, it gets a lot harder, especially in designs that require locking.

An alternative to locking is making use of immutable state. Immutable data structures are guaranteed to never change and can thus be passed freely between different threads without worrying about stepping on somebody else’s toes.

This design creates a new problem though: How do you manage changes in state without copying the entire state each time? This is especially tricky when collections are involved.

This is where immutable collections come in.

These collections include ImmutableHashSet<T> and ImmutableList<T>.

Performance

Since the immutable collections use tree data structures underneath to enable structural sharing, their performance characteristics are different from mutable collections. When comparing to a locking mutable collection, the results will depend on lock contention and access patterns. However, taken from another blog post about the immutable collections:

Q: I’ve heard that immutable collections are slow. Are these any different? Can I use them when performance or memory is important?

A: These immutable collections have been highly tuned to have competitive performance characteristics to the mutable collections while balancing memory sharing. In some cases they are very nearly as fast as the mutable collections both algorithmically and in actual time, sometimes even faster, while in other cases they are algorithmically more complex. In many cases however the difference will be negligible. Generally you should use the simplest code to get the job done and then tune for performance as necessary. The immutable collections help you write simple code, especially when thread-safety has to be considered.

In other words, in many cases the difference won't be noticeable and you should go with the simpler choice - which for concurrent sets would be to use ImmutableHashSet<T>, since you don't have an existing locking mutable implementation! :-)


The tricky part about making an ISet<T> concurrent is that the set methods (union, intersection, difference) are iterative in nature. At the very least you have to iterate over all n members of one of the sets involved in the operation, while locking both sets.

You lose the advantages of a ConcurrentDictionary<T,byte> when you have to lock the entire set during iteration. Without locking, these operations are not thread safe.

Given the added overhead of ConcurrentDictionary<T,byte>, it's probably wiser just to use the lighter weight HashSet<T> and just surround everything in locks.

If you don't need the set operations, use ConcurrentDictionary<T,byte> and just use default(byte) as the value when you are adding keys.


I prefer complete solutions so i did this: Mind you my Count is implemented in a different fashion because i don't see why one should be forbidden to read the hashset while attempting to count its values.

@Zen, Thanks for getting it started.

[DebuggerDisplay("Count = {Count}")]
[Serializable]
public class ConcurrentHashSet<T> : ICollection<T>, ISet<T>, ISerializable, IDeserializationCallback
{
    private readonly ReaderWriterLockSlim _lock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);

    private readonly HashSet<T> _hashSet = new HashSet<T>();

    public ConcurrentHashSet()
    {
    }

    public ConcurrentHashSet(IEqualityComparer<T> comparer)
    {
        _hashSet = new HashSet<T>(comparer);
    }

    public ConcurrentHashSet(IEnumerable<T> collection)
    {
        _hashSet = new HashSet<T>(collection);
    }

    public ConcurrentHashSet(IEnumerable<T> collection, IEqualityComparer<T> comparer)
    {
        _hashSet = new HashSet<T>(collection, comparer);
    }

    protected ConcurrentHashSet(SerializationInfo info, StreamingContext context)
    {
        _hashSet = new HashSet<T>();

        // not sure about this one really...
        var iSerializable = _hashSet as ISerializable;
        iSerializable.GetObjectData(info, context);
    }

    #region Dispose

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    protected virtual void Dispose(bool disposing)
    {
        if (disposing)
            if (_lock != null)
                _lock.Dispose();
    }

    public IEnumerator<T> GetEnumerator()
    {
        return _hashSet.GetEnumerator();
    }

    ~ConcurrentHashSet()
    {
        Dispose(false);
    }

    public void OnDeserialization(object sender)
    {
        _hashSet.OnDeserialization(sender);
    }

    public void GetObjectData(SerializationInfo info, StreamingContext context)
    {
        _hashSet.GetObjectData(info, context);
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }

    #endregion

    public void Add(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.Add(item);
        }
        finally
        {
            if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public void UnionWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        _lock.EnterReadLock();
        try
        {
            _hashSet.UnionWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            if (_lock.IsReadLockHeld) _lock.ExitReadLock();
        }
    }

    public void IntersectWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        _lock.EnterReadLock();
        try
        {
            _hashSet.IntersectWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            if (_lock.IsReadLockHeld) _lock.ExitReadLock();
        }
    }

    public void ExceptWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        _lock.EnterReadLock();
        try
        {
            _hashSet.ExceptWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            if (_lock.IsReadLockHeld) _lock.ExitReadLock();
        }
    }

    public void SymmetricExceptWith(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.SymmetricExceptWith(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsSubsetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsSubsetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsSupersetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsSupersetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsProperSupersetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsProperSupersetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool IsProperSubsetOf(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.IsProperSubsetOf(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool Overlaps(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Overlaps(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool SetEquals(IEnumerable<T> other)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.SetEquals(other);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    bool ISet<T>.Add(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Add(item);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public void Clear()
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.Clear();
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool Contains(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Contains(item);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public void CopyTo(T[] array, int arrayIndex)
    {
        _lock.EnterWriteLock();
        try
        {
            _hashSet.CopyTo(array, arrayIndex);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public bool Remove(T item)
    {
        _lock.EnterWriteLock();
        try
        {
            return _hashSet.Remove(item);
        }
        finally
        {
            if (_lock.IsWriteLockHeld) _lock.ExitWriteLock();
        }
    }

    public int Count
    {
        get
        {
            _lock.EnterWriteLock();
            try
            {
                return _hashSet.Count;
            }
            finally
            {
                if(_lock.IsWriteLockHeld) _lock.ExitWriteLock();
            }

        }
    }

    public bool IsReadOnly
    {
        get { return false; }
    }
}