Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices on what should be key in a hashtable

The best look-up structure is a HashTable. It provides constant access on average (linear in worst case).
This depends on the hash function. Ok.
My question is the following. Assuming a good implementation of a HashTable e.g. HashMap is there a best practice concerning the keys passed in the map?I mean it is recommended that the key must be an immutable object but I was wondering if there are other recommendations.
Example the size of the key? For example in a good hashmap (in the way described above) if we used String as keys, won't the "bottleneck" be in the string comparison for equals (trying to find the key)? So should the keys be kept small? Or are there objects that should not be used as keys? E.g. a URL? In such cases how can you choose what to use as a key?

like image 976
Cratylus Avatar asked Jan 16 '13 20:01

Cratylus


2 Answers

The best performing key for an HashMap is probably an Integer, where hashCode() and equals() are implemented as:

public int hashCode() {
    return value;
}

public boolean equals(Object obj) {
    if (obj instanceof Integer) {
        return value == ((Integer)obj).intValue();
    }
    return false;
}

Said that, the purpose of an HashMap is to map some object (value) to some others (key). The fact that a hash function is used to address the (value) objects is to provide fast, constant-time access.

it is recommended that the key must be an immutable object but I was wondering if there are other recommendations.

The recommendation is to Map objects to what you need: don't think what is faster; but think what is the best for your business logic to address the objects to retrieve.

The important requirement is that the key object must be immutable, because if you change the key object after storing it in the Map it may be not possible to retrieve the associated value later.

The key word in HashMap is Map. Your object should just map. If you sacrifice the mapping task optimizing the key, you are defeating the purpose of the Map - without probably achieving any performance boost.

I 100% agree with the first two comments in your question:

the major constraint is that it has to be the thing that you want to base the lookup on ;)
– Oli Charlesworth

The general rule is to use as the key whatever you need to look up with.
– Louis Wasserman

Remember the two rules for optimization:

  1. Don't.
  2. (for experts only) don't yet.

The third rule is: profile before to optimize.

like image 148
Luigi R. Viggiano Avatar answered Oct 05 '22 01:10

Luigi R. Viggiano


You should use whatever key you want to use to lookup things in the data structure, it's typically a domain-specific constraint. With that said, keep in mind that both hashCode() and equals() will be used in finding a key in the table.

hashCode() is used to find the position of the key, while equals() is used to determine if the key you are searching for is actually the key that we just found using hashCode().

For example, consider two keys a and b that have the same hash code in a table using separate chaining. Then a search for a would require testing if a.equals(key) for potentially both a and b in the table once we find the index of the list containing a and b from hashCode().

like image 26
Alex DiCarlo Avatar answered Oct 05 '22 01:10

Alex DiCarlo