Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I use a concatenation of my string fields as a hash code?

I have an Address class in C# that looks like this:

public class Address
{            
    public string StreetAddress { get; set; }
    public string RuralRoute { get; set; }
    public string City { get; set; }
    public string Province { get; set; }
    public string Country { get; set; }
    public string PostalCode { get; set; }
}

I'm implementing equality and so I need to override the hash code. At first I was going to use the hashcode formula from EJ but then I thought: These are all string fields, can't I just just use a StringBuilder to concatenate them and return the hash code from that string?

That is:

var str = new StringBuilder();
str.Append(StreetAddress)
   .Append(RuralRoute)
   ...

return str.ToString().GetHashCode();

What are the advantages/disadvantages of this? Why shouldn't I do it?

like image 671
cdmckay Avatar asked Jun 05 '09 19:06

cdmckay


People also ask

Can two strings have same Hashcode?

Two same strings/value must have the same hashcode, but the converse is not true. There might be another string which can match the same hash-code, so we can't derive the key using hash-code. The reason for two different string to have the same hash-code is due to the collision.

Can Hashcode be a string?

The Java String hashCode() method returns a hash code for the string. A hashcode is a number (object's memory address) generated from any object, not just strings. This number is used to store/retrieve objects quickly in a hashtable. Here, string is an object of the String class.

What is wrong with string concatenation in loop?

If you concatenate Stings in loops for each iteration a new intermediate object is created in the String constant pool. This is not recommended as it causes memory issues.

What is the correct way to concatenate the strings?

You concatenate strings by using the + operator. For string literals and string constants, concatenation occurs at compile time; no run-time concatenation occurs. For string variables, concatenation occurs only at run time.


1 Answers

I would avoid doing that simply on the grounds that it creates a bunch of strings pointlessly - although Kosi2801's point about making collisions simple is also relevant. (I suspect it wouldn't actually create many collisions, due to the nature of the fields, but...)

I would go for the "simple and easy to get right" algorithm I've previously used in this answer (thanks for looking it up lance :) - and which is listed in Effective Java, as you said. In this case it would end up as:

public int GetHashCode()
{
    int hash = 17;
    // Suitable nullity checks etc, of course :)
    hash = hash * 23 + StreetAddress.GetHashCode();
    hash = hash * 23 + RuralRoute.GetHashCode();
    hash = hash * 23 + City.GetHashCode();
    hash = hash * 23 + Province.GetHashCode();
    hash = hash * 23 + Country.GetHashCode();
    hash = hash * 23 + PostalCode.GetHashCode();
    return hash;
}

That's not null-safe, of course. If you're using C# 3 you might want to consider an extension method:

public static int GetNullSafeHashCode<T>(this T value) where T : class
{
    return value == null ? 1 : value.GetHashCode();
}

Then you can use:

public int GetHashCode()
{
    int hash = 17;
    // Suitable nullity checks etc, of course :)
    hash = hash * 23 + StreetAddress.GetNullSafeHashCode();
    hash = hash * 23 + RuralRoute.GetNullSafeHashCode();
    hash = hash * 23 + City.GetNullSafeHashCode();
    hash = hash * 23 + Province.GetNullSafeHashCode();
    hash = hash * 23 + Country.GetNullSafeHashCode();
    hash = hash * 23 + PostalCode.GetNullSafeHashCode();
    return hash;
}

You could create a parameter array method utility to make this even simpler:

public static int GetHashCode(params object[] values)
{
    int hash = 17;
    foreach (object value in values)
    {
        hash = hash * 23 + value.GetNullSafeHashCode();
    }
    return hash;
}

and call it with:

public int GetHashCode()
{
    return HashHelpers.GetHashCode(StreetAddress, RuralRoute, City,
                                   Province, Country, PostalCode);
}

In most types there are primitives involved, so that would perform boxing somewhat unnecessarily, but in this case you'd only have references. Of course, you'd end up creating an array unnecessarily, but you know what they say about premature optimization...

like image 177
Jon Skeet Avatar answered Sep 22 '22 05:09

Jon Skeet