Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Dictionary internal array size

I'm not sure if this is the right question to ask here but please don't kill me :)

I have an argument with a friend about C#'s dictionary… She tells me that if I have lets say dictionary with 1 element. And the hash code for the key is 100000 then the internal array of the dictionary will be at the size of 100000!

Is it true? I tried to find answers on Google but for some reason I didn't find for that question.

like image 968
Sergey Rotbart Avatar asked Oct 06 '22 12:10

Sergey Rotbart


2 Answers

The default constructor of Dictionary, "has the default initial capacity", according to MSDN.

It also states:

If you can estimate the size of the collection, using a constructor that specifies the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary.

One such constructor simply takes an Int32, which initialises the internal storage as follows:

The initial number of elements that the Dictionary can contain.

What the "default initial capacity" of the dictionary actually is is an internal implementation detail of the class and as such not exposed in the documentation or public API.

Disassembling mscorlib with ilspy and examining the default constructor shows that it is implemented as follows:

public Dictionary() : this(0, null)
{
}

That chained constructor is implemented as follows:

public Dictionary(int capacity, IEqualityComparer<TKey> comparer)
{
    if (capacity < 0)
    {
        ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
    }

    if (capacity > 0)
    {
        this.Initialize(capacity);
    }

    this.comparer = (comparer ?? EqualityComparer<TKey>.Default);
}

i.e. Initialize() is not called at all by the default constructor, either directly or indirectly.

Initialize() is the method that sets up the internal storage.

So actually, if you call the default constructor, the internal storage size is not even initialised until you first add an item. So it has essentially a zero size.

Initialize() is eventually called with a value of zero the first time you call .Add(), which sets things up.

private void Initialize(int capacity)
{
    int prime = HashHelpers.GetPrime(capacity);
    this.buckets = new int[prime];
    for (int i = 0; i < this.buckets.Length; i++)
    {
        this.buckets[i] = -1;
    }
    this.entries = new Dictionary<TKey, TValue>.Entry[prime];
    this.freeList = -1;
}

GetPrime(0) returns 3, so this.buckets is set to an array containing three integers.

The line that assigns a value to this.entries looks a little odd but I don't see where 100000 comes into this.

Short answer
I think your colleague is wrong.

like image 98
tomfanning Avatar answered Oct 09 '22 00:10

tomfanning


The hash code (ie GetHashCode) is used for placing items in the buckets that the dictionary uses.

The actual capacity used is based upon the number of elements in the dictionary.

The (probably not accurate) pseudo code for what GetHashCode is used for is like so.

List<List<KeyValuePair<T,J>>> buckets; // let's assume this get's allocated somewhere (the dictionary allocates this internally)
...
public J GetValueFromDictionary(T key)
{
    int bucketIndex = key.GetHashCode() % buckets.Length;
    return buckets[bucketIndex].Find(x => x.Key == key).Single().Value;
}
like image 41
Matthew Avatar answered Oct 09 '22 01:10

Matthew