Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dictionary <,> Size , GetHashCode and Prime Numbers?

I've been reading quite a lot about this interesting topic (IMO). but I'm not fully understand one thing :

Dictionary size is increasing its capacity ( doubles to the closest prime number) to a prime number (when reallocation) : because :

int index = hashCode % [Dictionary Capacity];
  • So we can see that prime numbers are used here for [Dictionary Capacity] because their GreatestCommonFactor is 1. and this helps to avoid collisions.

In addition

I've seen many samples of implementing theGetHashCode() :

Here is a sample from Jon Skeet :

public override int GetHashCode()
{
    unchecked 
    {
        int hash = 17;
        // Suitable nullity checks etc, of course :)
        hash = hash * 23 + field1.GetHashCode();
        hash = hash * 23 + field2.GetHashCode();
        hash = hash * 23 + field3.GetHashCode();
        return hash;
    }
}

I don't understand :

Question

Does prime numbers are used both in : Dictionary capacity and in the generation of getHashCode ?

Because in the code above , there is a good chance that the return value will not be a prime number [please correct me if i'm wrong] because of the

  • multiplication by 23
  • addition of the GetHashCode() value for each field.

For Example : (11,17,173 are prime number)

        int hash = 17;
        hash = hash * 23 + 11; //402
        hash = hash * 23 + 17; //9263
        hash = hash * 23 + 173 //213222
        return hash;

213222 is not a prime.

Also there is not any math rule which state :

(not a prime number) + (prime number) = (prime number)

nor

(not a prime number) * (prime number) = (prime number)

nor

(not a prime number) * (not a prime number) = (prime number)

So what am I missing ?

like image 994
Royi Namir Avatar asked Feb 20 '13 16:02

Royi Namir


People also ask

Why hashtable size is prime?

They famously are only divisible by 1 and themselves. Thus, choosing to set your hash table length to a large prime number will greatly reduce the occurrence of collisions.

What is the vocabulary for prime number?

A prime number is a whole number greater than 1 whose only factors are 1 and itself. A factor is a whole number that can be divided evenly into another number. The first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, 23 and 29. Numbers that have more than two factors are called composite numbers.


1 Answers

It does not matter what the result of GetHashCode is (it does not have to be prime at all), as long as the result is the same for two objects that are considered to be equal. However, it is nice (but not required) to have GetHashCode return a different value for two objects that are considered to be different (but still not necessarily prime).

Given two numbers a and b, when you multiply them you get c = a * b. There are usually multiple different pairs of a and b that give the same result c. For example 6 * 2 = 12 and 4 * 3 = 12. However, when a is a prime number, there are a lot less pairs that give the same result. This is convenient for the property that the hash code should be different for different objects.

In the dictionary the same principle applies: the objects are put in buckets depending on their hash. Since most integers do not divide nicely by a prime number, you get a nice spreading of your objects in the buckets. Ideally you'd want only one item in each bucket for optimal dictionary performance.


Slightly off-topic: Cicada's (that's an insect) use prime numbers to determine after how many years they go and mate again. Since this mating cycle is a prime number of years, the chances of the mating continously coinciding with the life cycles of any of its enemies are slim.

like image 172
Daniel A.A. Pelsmaeker Avatar answered Sep 30 '22 17:09

Daniel A.A. Pelsmaeker