What are hashtables and hashmaps and their typical use cases?

2 Answers

lassevk's answer is very good, but might contain a little too much detail. Here is the executive summary. I am intentionally omitting certain relevant information which you can safely ignore 99% of the time.

There is no important difference between hash tables and hash maps 99% of the time.

Hash tables are magic

Seriously. Its a magic data structure which all but guarantees three things. (There are exceptions. You can largely ignore them, although learning them someday might be useful for you.)

1) Everything in the hash table is part of a pair -- there is a key and a value. You put in and get out data by specifying the key you are operating on.

2) If you are doing anything by a single key on a hash table, it is blazingly fast. This implies that put(key,value), get(key), contains(key), and remove(key) are all really fast.

3) Generic hash tables fail at doing anything not listed in #2! (By "fail", we mean they are blazingly slow.)

When do we use hash tables?

We use hash tables when their magic fits our problem.

For example, caching frequently ends up using a hash table -- for example, let's say we have 45,000 students in a university and some process needs to hold on to records for all of them. If you routinely refer to student by ID number, then a ID => student cache makes excellent sense. The operation you are optimizing for this cache is fast lookup.

Hashes are also extraordinarily useful for storing relationships between data when you don't want to go whole hog and alter the objects themselves. For example, during course registration, it might be a good idea to be able to relate students to the classes they are taking. However, for whatever reason you might not want the Student object itself to know about that. Use a studentToClassRegistration hash and keep it around while you do whatever it is you need to do.

They also make a fairly good first choice for a data structure except when you need to do one of the following:

When Not To Use Hash Tables

Iterate over the elements. Hash tables typically do not do iteration very well. (Generic ones, that is. Particular implementations sometimes contain linked lists which are used to make iterating over them suck less. For example, in Java, LinkedHashMap lets you iterate over keys or values quickly.)

Sorting. If you can't iterate, sorting is a royal pain, too.

Going from value to key. Use two hash tables. Trust me, I just saved you a lot of pain.

answered Oct 02 '22 19:10

Patrick McKenzie

Well, think of it this way.

If you use an array, a simple index-based data structure, and fill it up with random stuff, finding a particular entry gets to be a more and more expensive operation as you fill it with data, since you basically have to start searching from one end toward the other, until you find the one you want.

If you want to get faster access to data, you typicall resort to sorting the array and using a binary search. This, however, while increasing the speed of looking up an existing value, makes inserting new values slow, as you need to move existing elements around when you need to insert an element in the middle.

A hashtable, on the other hand, has an associated function that takes an entry, and reduces it to a number, a hash-key. This number is then used as an index into the array, and this is where you store the entry.

A hashtable revolves around an array, which initially starts out empty. Empty does not mean zero length, the array starts out with a size, but all the elements in the array contains nothing.

Each element has two properties, data, and a key that identifies the data. For instance, a list of zip-codes of the US would be a zip-code -> name type of association. The function reduces the key, but does not consider the data.

So when you insert something into the hashtable, the function reduces the key to a number, which is used as an index into this (empty) array, and this is where you store the data, both the key, and the associated data.

Then, later, you want to find a particular entry that you know the key for, so you run the key through the same function, get its hash-key, and goes to that particular place in the hashtable and retrieves the data there.

The theory goes that the function that reduces your key to a hash-key, that number, is computationally much cheaper than the linear search.

A typical hashtable does not have an infinite number of elements available for storage, so the number is typically reduced further down to an index which fits into the size of the array. One way to do this is to simply take the modulus of the index compared to the size of the array. For an array with a size of 10, index 0-9 will map directly to an index, and index 10-19 will map down to 0-9 again, and so on.

Some keys will be reduced to the same index as an existing entry in the hashtable. At this point the actual keys are compared directly, with all the rules associated with comparing the data types of the key (ie. normal string comparison for instance). If there is a complete match, you either disregard the new data (it already exists) or you overwrite (you replace the old data for that key), or you add it (multi-valued hashtable). If there is no match, which means that though the hash keys was identical, the actual keys were not, you typically find a new location to store that key+data in.

Collision resolution has many implementations, and the simplest one is to just go to the next empty element in the array. This simple solution has other problems though, so finding the right resolution algorithm is also a good excercise for hashtables.

Hashtables can also grow, if they fill up completely (or close to), and this is usually done by creating a new array of the new size, and calculating all the indexes once more, and placing the items into the new array in their new locations.

The function that reduces the key to a number does not produce a linear value, ie. "AAA" becomes 1, then "AAB" becomes 2, so the hashtable is not sorted by any typical value.

There is a good wikipedia article available on the subject as well, here.

132

answered Oct 02 '22 18:10

Lasse V. Karlsen

Related questions
                            
                                Code Golf: Number to Words
                            
                                3 dimensional bin packing algorithms
                            
                                What is the difference between a mixin and the decorator pattern?
                            
                                Markov Decision Process: value iteration, how does it work?
                            
                                Puzzle: Find largest rectangle (maximal rectangle problem)
                            
                                Is bit shifting O(1) or O(n)?
                            
                                Flow Based Programming
                            
                                command pattern returning status
                            
                                How should release notes be written?
                            
                                Efficiently selecting a set of random elements from a linked list
                            
                                Difference between Left Factoring and Left Recursion
                            
                                Is Sleep() evil?
                            
                                Do all programming languages have boolean short-circuit evaluation?
                            
                                What is the purpose of null?
                            
                                How to check if line segment intersects a rectangle?
                            
                                How can a developer learn about web design? [closed]
                            
                                Good examples, articles, books for understanding dynamic programming [closed]
                            
                                How Could One Implement the K-Means++ Algorithm?
                            
                                How do I display the binary representation of a float or double?
                            
                                Have you ever restricted yourself to using a subset of language features?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What are hashtables and hashmaps and their typical use cases?

Tags:

language-agnostic

hashmap

hashtable

Kamil Zadora

People also ask