Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java's HashMap collision resolution

Tags:

java

hashmap

So from what I've read on stackoverflow and other websites. Java uses linkedlists for hash collision resolution.

This would guarantee a O(n) complexity for worst case scenarios of inserting, getting and, removing.

Why does Java not use a self balancing BST (Like AVL, Red Black, etc...) to guarantee a O(log n) complexity for worst case scenarios of inserting, getting, and, removing?

like image 209
Amir Omidi Avatar asked Oct 31 '16 01:10

Amir Omidi


2 Answers

Java 8 does, if a linked chain gets big enough.

For small numbers of elements, though, it adds significant memory and performance overhead. A linked list is really extremely efficient for very small numbers of elements, which are exactly what you expect for hash buckets in 99% of situations. Additionally, defining how, exactly, the binary tree should be sorted is non-obvious, and has to special-case when the elements are Comparable, sorting by the unused hash code bits...it gets hairy.

like image 75
Louis Wasserman Avatar answered Oct 19 '22 19:10

Louis Wasserman


Most of the time there's going to be a very small number of items in a bucket; often zero or one. In these cases, a simple hash bucket structure is able to guarantee O(1); an O(log n) BST might cut off time in some sub-optimal edge cases, but the performance gain is negligible at best and negative at worst. There's also significant memory overhead. Java 8 does make an effort to detect when a linked list is no longer optimal and convert to a BST; however, if this behavior occurs frequently, it's a sign that hashes and HashMap are being used incorrectly.

There are a lot of implementation details available when reading the source code for the JDK. Here's a brief excerpt from the top of Oracle's java.util.HashMap:

/*
 * Implementation notes.
 *
 * This map usually acts as a binned (bucketed) hash table, but
 * when bins get too large, they are transformed into bins of
 * TreeNodes, each structured similarly to those in
 * java.util.TreeMap. Most methods try to use normal bins, but
 * relay to TreeNode methods when applicable (simply by checking
 * instanceof a node).  Bins of TreeNodes may be traversed and
 * used like any others, but additionally support faster lookup
 * when overpopulated. However, since the vast majority of bins in
 * normal use are not overpopulated, checking for existence of
 * tree bins may be delayed in the course of table methods.
 * [...]

Looking at the implementations of HashMap#getNode and HashMap.Node, we can see that each bucket starts as a very simple linked list--simpler than java.util.LinkedList, which is actually a doubly-linked list.

Per the comment, when a list grows to a certain size, it's converted to a tree. It's hard to tell exactly what's going on in HashMap.TreeNode because the code isn't exactly self-descriptive, but it appears to be a simple red-black BST.

like image 2
Zenexer Avatar answered Oct 19 '22 19:10

Zenexer