Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List<T> capacity increasing vs Dictionary<K,V> capacity increasing?

Tags:

c#

.net

Why does List<T> increase its capacity by a factor of 2?

private void EnsureCapacity(int min)
{
    if (this._items.Length < min)
    {
        int num = (this._items.Length == 0) ? 4 : (this._items.Length * 2);
        if (num < min)
        {
            num = min;
        }
        this.Capacity = num;
    }
}

Why does Dictionary<K,V> use prime numbers as capacity?

private void Resize()
{
    int prime = HashHelpers.GetPrime(this.count * 2);
    int[] numArray = new int[prime];
    for (int i = 0; i < numArray.Length; i++)
    {
        numArray[i] = -1;
    }
    Entry<TKey, TValue>[] destinationArray = new Entry<TKey, TValue>[prime];
    Array.Copy(this.entries, 0, destinationArray, 0, this.count);
    for (int j = 0; j < this.count; j++)
    {
        int index = destinationArray[j].hashCode % prime;
        destinationArray[j].next = numArray[index];
        numArray[index] = j;
    }
    this.buckets = numArray;
    this.entries = destinationArray;
}

Why doesn't it also just multiply by 2? Both are dealing with finding continues memory location...correct?

like image 537
Royi Namir Avatar asked Jan 30 '13 08:01

Royi Namir


People also ask

What is the default capacity of a dictionary C#?

It will create a dictionary with an initial default capacity of 5. We can create the dictionary by using the different constructors of the dictionary.

How c# dictionary works?

In C#, Dictionary is a generic collection which is generally used to store key/value pairs. The working of Dictionary is quite similar to the non-generic hashtable. The advantage of Dictionary is, it is generic type. Dictionary is defined under System.


3 Answers

It's common to use prime numbers for hash table sizes because it reduces the probability of collisions.

Hash tables typically use the modulo operation to find the bucket where an entry belongs, as you can see in your code:

int index = destinationArray[j].hashCode % prime;

Suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering:

  1. Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.

  2. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.

(from http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html)

HashHelpers.GetPrime(this.count * 2) 

should return a prime number. Look at the definition of HashHelpers.GetPrime().

like image 85
Martin Konicek Avatar answered Oct 22 '22 06:10

Martin Konicek


Dictionary puts all its objects into buckets depending on their GetHashCode value, i.e.
Bucket[object.GetHashCode() % DictionarySize] = object;
It uses a prime number for size to avoid the chance of collisions. Presumably a size with many divisors would be bad for poorly designed hash codes.

like image 35
HugoRune Avatar answered Oct 22 '22 07:10

HugoRune


From a question in SO;

Dictionary or hash table relies on hashing the key to get a smaller index to look up into corresponding store (array). So choice of hash function is very important. Typical choice is to get hash code of a key (so that we get good random distribution) and then divide the code by a prime number and use reminder to index into fixed number of buckets. This allows to convert arbitrarily large hash codes into a bounded set of small numbers for which we can define an array to look up into. So its important to have array size in prime number and then the best choice for the size become the prime number that is larger than the required capacity. And that's exactly dictionary implementation does.

List<T> employs arrays to store data; and increasing the capacity of an array requires copying the array to a new memory location; which is time consuming. I guess, in order to lower the occurence of copying arrays, list doubles it's capacity.

like image 1
daryal Avatar answered Oct 22 '22 07:10

daryal