I am struggling with this problem and would really appreciate any help.
I'm working on an existing project. I have added logic that counts values combinations, and makes sure we do not pass some limit. For example, given this data table columns:
Name|Age|description
The code makes sure we do not have more than K combinations of Name, Age. I have data that contains something like million pairs like this. At some point, the program just crashes, or get stuck, though I do not see any memory problem or CPU issue.
I implemented this limit using ConcurrentDictionary of tuples (Name, Age) as keys, and I'm using C# .NET 6 ..
I can see that the time it takes to try and add an element to the DS becomes really long at some point.
Edit: adding some code pieces, though it is a lot of inner implementation, I do believe these are the main code parts to understand the problem:
here is the component that's responsible for limiting the keys:
protected override Result Process(Row row)
{
var valueToLimit = GetValueToLimit(row);
var result = _values.TryAdd(valueToLimit);
}
// some logic related to the case of crossing the limit
return Result.Success;
}
protected abstract T GetValueToLimit(Row row);
}
The function GetValueToLimit is implemented for my case :
protected override string[] GetValueToLimit(Row row)
{ // takes the relevant values from an input record, according to the requested columns.
return _columnIndices.Select(x => row.GetValue(x)).ToArray();
}
and finally, here is some parts of the concurrent HashSet implementation:
public class BoundedConcurrentHashSet<K> : ConcurrentHashSet<K>
{
..
public override Result TryAdd(K element)
{
if (Dictionary.Count() < _maxCapacity)
{
return base.TryAdd(element);
}
else
{
return Contains(element) ? Result.AlreadyInHash : Result.ExceedsCapacity;
}
}
where concurrentHashSet is implemented with C# concurrentDictionary:
public class ConcurrentHashSet<K>
{
public ConcurrentHashSet(IEqualityComparer<K> equalityComparer)
{
Dictionary = new ConcurrentDictionary<K, object>(equalityComparer);
}
protected ConcurrentDictionary<K, object> Dictionary { get; }
public int Count => Dictionary.Count;
public IEnumerable<K> Elements => Dictionary.Keys;
public virtual Result TryAdd(K element)
{
return Dictionary.TryAdd(element, null) ? dResult.Added : Result.AlreadyInHash;
}
public bool Contains(K element)
{
return Dictionary.ContainsKey(element);
}
Please share any idea that can help.
Thanks
Here is your problem:
public override ConcurrentHashSetAddResult TryAdd(K element)
{
if (Dictionary.Count() < _maxCapacity)
{
return base.TryAdd(element);
}
//...
...where Dictionary is the underlying ConcurrentDictionary<K, object> object.
The Count() is a LINQ method that either enumerates the enumerable sequence from start to end, or returns the Count property in case the sequence implements the ICollection<TSource> interface. The ConcurrentDictionary<K, V> implements this interface, so the Count property is used indeed. Here is what the documentation of this property says:
This property has snapshot semantics and represents the number of items in the
ConcurrentDictionary<TKey,TValue>at the moment when the property was accessed.
The "snapshot semantics" is the important part. It means that in order to acquire the Count, the dictionary has to be completely locked temporarily. When a thread reads the Count, all other threads have to wait. No concurrency at all.
An ApproximateCount property had been proposed at some point on GitHub, but it didn't got enough traction and it's now closed. That property would allow you to implement the BoundConcurrentHashSet functionality with much reduced overhead, but also with less accurate behavior: it would be possible to exceed the _maxCapacity configuration.
My suggestion is to ditch the ConcurrentDictionary<K, object>, and use a HashSet<T> as underlying storage, protected with a lock.
I have found that using normal collections and employing a lock around the place you're iterating and adding is a lot faster than using the Concurrent collections. This becomes true the more items you add to your collection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With