Let's say that I need to make a mapping from String
to an integer. The integers are unique and form a continuous range starting from 0. That is:
Hello -> 0
World -> 1
Foo -> 2
Bar -> 3
Spam -> 4
Eggs -> 5
etc.
There are at least two straightforward ways to do it. With a hashmap:
HashMap<String, Integer> map = ...
int integer = map.get(string); // Plus maybe null check to avoid NPE in unboxing.
Or with a list:
List<String> list = ...
int integer = list.indexOf(string); // Plus maybe check for -1.
Which approach should I use, and why? Arguably the relative performance depends on the size of the list/map, since List#indexOf()
is a linear search using String#equals()
-> O(n) efficiency, while HashMap#get()
uses hash to narrow down the search -> certainly more efficient when the map is big, but maybe inferior when there are just few elements (there must be some overhead in calculating the hash, right?).
Since benchmarking Java code properly is notoriously hard, I would like to get some educated guesses. Is my reasoning above correct (list is better for small, map is better for large)? What is the threshold size approximately? What difference do various List
and HashMap
implementations make?
A third option and possibly my favorite would be to use a trie:
I bet it beats the HashMap
in performance (no collisions + the fact that computing the hash-code is O(length of string)
anyway), and possibly also the List
approach in some cases (such as if your strings have long common prefixes, as the indexOf would waste lot of time in the equals
methods).
When choosing between List and Map I would go for a Map
(such as HashMap
). Here is my reasoning:
Readability
The Map interface simply provides a more intuitive interface for this use case.
Optimization in the right place
I'd say if you're using a List
you would be optimizing for the small cases anyway. That's probably not where the bottle neck is.
A fourth option would be to use a LinkedHashMap
, iterate through it if the size is small, and get
the associated number if the size is large.
A fifth option is to encapsulate the decision in a separate class all together. In this case you could even implement it to change strategy in runtime as the list grows.
You're right: a List would be O(n), a HashMap would be O(1), so a HashMap would be faster for n large enough so that the time to calculate the hash didn't swamp the List linear search.
I don't know the threshold size; that's a matter for experimentation or better analytics than I can muster right now.
Your question is totally correct on all points:
HashMap
s are better (they use a hash)But at the end of the day, you're just going to have to benchmark your particular application. I don't see why HashMaps would be slower for small cases but the benchmarking will give you the answer if it is or not.
One more option, a TreeMap
is another map data structure which uses a tree as opposed to a hash to access the entries. If you are doing benchmarking, you might as well benchmark that as well.
Regarding benchmarking, one of the main problems is the garbage collector. However if you do a test which doesn't allocate any objects, that shouldn't be a problem. Fill up your map/list, then just write a loop to get N random elements, and then time it, that should be reasonably reproducable and therefore informative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With