Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Question about GetHashCode implementation

http://msdn.microsoft.com/en-us/library/system.object.gethashcode(VS.80).aspx says:

For the best performance, a hash function must generate a random distribution for all input.

Does it have any effect in performance or it's ok to use a function (like return this.Id) that doesn't give a "random distribution" but it doesn't cause more collisions?

like image 889
ggf31416 Avatar asked May 29 '26 19:05

ggf31416


2 Answers

return this.Id would usually be fine (especially if Id is immutable and unique) - the main idea is to avoid collisions. However, think also about pending data - what is the Id of the 27 rows you haven't saved yet?

Also note that the GetHashCode and Equals implementations must agree.

like image 87
Marc Gravell Avatar answered Jun 01 '26 10:06

Marc Gravell


Using this.Id is generally okay. The underpinning is that you don't want too many collisions which would end up in the same bucket. The bucket number is generally obtained by taking the hash code and considering it "mod x" where x is the number of buckets in your hash table, and is usually prime (or a probable prime).

If you're just using increasing IDs (1, 2, 3, 4...) that's going to end up being pretty random as far as the bucket distribution is concerned. It's only if your ID followed a pattern which might well give the same bucket number for lots of entries that you'd need to worry.

like image 34
Jon Skeet Avatar answered Jun 01 '26 08:06

Jon Skeet