Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of Object.GetHashCode()

I'm reading Effective C# and there is a comment about Object.GetHashCode() that I didn't understand:

Object.GetHashCode() uses an internal field in the System.Object class to generate the hash value. Each object created is assigned a unique object key, stored as an integer, when it is created.
These keys start at 1 and increment every time a new object of any type gets created. The object identity field is set in the System.Object constructor and cannot be modified later. Object.GetHashCode() returns this value as the hash code for a given object.

I tried to look at the documentation of Object.GetHashCode() and didn't find any information about this.

I wrote the simple piece of code to print the hash code of newly generated objects:

using System;

namespace TestGetHashCode
{
    class Program
    {
        static void Main(string[] args)
        {
            for (int i = 0; i < 100; i++)
            {
                object o = new object();
                Console.WriteLine(o.GetHashCode());
            }
        }
    }
}

The first few numbers that were printed were:

37121646,
45592480,
57352375,
2637164,
41014879,
3888474,
25209742,
26966483,
31884011

Which didn't seem to fit that

These keys start at 1 and increment every time a new object of any type gets created...Object.GetHashCode() returns this value

Then, in order to find this "internal field in the System.Object" I tried using ReSharper decompiled sources but the code I found was

[TargetedPatchingOptOut("Performance critical to inline across NGen image boundaries")]
[__DynamicallyInvokable]
public virtual int GetHashCode()
{
  return RuntimeHelpers.GetHashCode(this);
}

and again using decompiled sources I found that RuntimeHelpers.GetHashCode was implemented as

[SecuritySafeCritical]
[__DynamicallyInvokable]
[MethodImpl(MethodImplOptions.InternalCall)]
public static int GetHashCode(object o);

following the MethodImpl attribute it seems that I can't view the implementation and this is a dead end for me.

Can someone please explain the comment by the author (the first quote) ?

What is the internal field within the Object class and how it is used for the implementation of the Object.GetHashCode()?

like image 715
Belgi Avatar asked Nov 28 '14 20:11

Belgi


People also ask

Do I need to implement GetHashCode?

Why is it important to override GetHashCode ? It s important to implement both equals and gethashcode, due to collisions, in particular while using dictionaries. if two object returns same hashcode, they are inserted in the dictionary with chaining. While accessing the item equals method is used.

What does GetHashCode return?

This is kind of obvious, when you think about it: GetHashCode returns an Int32 , which has “only” about 4.2 billion possible values, and there's potentially an infinity of different objects, so some of them are bound to have the same hash code. And no, it can't be used to test if two objects are equal.

What is return type of GetHashCode C#?

GetHashCode() Method is used to return the hash code for this instance. Syntax: public override int GetHashCode (); Return Value: This method returns the hash code for the current instance.

How does hashCode () method work in Java?

Simply put, hashCode() returns an integer value, generated by a hashing algorithm. Objects that are equal (according to their equals()) must return the same hash code. Different objects do not need to return different hash codes.


1 Answers

Okay, I'd better write this up. The book is very inaccurate. The value for Object.GetHashCode() is generated inside the CLR and is calculated on demand, whenever GetHashCode() is called the first time. I'll quote the code from the SSCLI20 distribution, clr/src/vm/thread.h has the function that produces the number, it looks like this (edited for readability):

inline DWORD GetNewHashCode()
{
    // Every thread has its own generator for hash codes so that we won't get into a 
    // situation where two threads consistently give out the same hash codes.
    // Choice of multiplier guarantees period of 2**32
    // see Knuth Vol 2 p16 (3.2.1.2 Theorem A).
    DWORD multiplier = m_ThreadId*4 + 5;
    m_dwHashCodeSeed = m_dwHashCodeSeed*multiplier + 1;
    return m_dwHashCodeSeed;
}

After which it is stored in the so-called sync block of the object so subsequent calls return the same value. Only 26 of the generated 32 bits are actually stored, the sync block needs space for some status bits. Still plenty good enough to generate a very high quality hash code, collisions are quite rare.

The presence of the m_ThreadId variable in that code can use an explanation. The random number generator seed is stored for each individual thread. A trick to avoid having to take a lock.

The m_dwHashCodeSeed is initialized in the Thread constructor like this:

   // Initialize this variable to a very different start value for each thread
   // Using linear congruential generator from Knuth Vol. 2, p. 102, line 24
   dwHashCodeSeed = dwHashCodeSeed * 1566083941 + 1;
   m_dwHashCodeSeed = dwHashCodeSeed;

with:

   static  DWORD dwHashCodeSeed = 123456789;
like image 155
Hans Passant Avatar answered Sep 22 '22 18:09

Hans Passant