Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why hash maps in Java 8 use binary tree instead of linked list? [closed]

I recently came to know that in Java 8 hash maps uses binary tree instead of linked list and hash code is used as the branching factor.I understand that in case of high collision the lookup is reduced to O(log n) from O(n) by using binary trees.My question is what good does it really do as the amortized time complexity is still O(1) and maybe if you force to store all the entries in the same bucket by providing the same hash code for all keys we can see a significant time difference but no one in their right minds would do that.

Binary tree also uses more space than singly linked list as it stores both left and right nodes.Why increase the space complexity when there is absolutely no improvement in time complexity except for some spurious test cases.

like image 526
user1613360 Avatar asked Mar 09 '16 09:03

user1613360


People also ask

Which binary tree is used in HashMap in Java 8?

HashMap class uses Node as a subclass for holding key-value entries. Members of this class are: final int hash; final K key; V value; Node<K,V> next; The binary tree implementation is a Red-Black tree. A red-black tree is a sorted tree, in which search takes max log(n) operations.

Why is a binary tree better than a hash table?

Binary Search Trees are generally memory-efficient since they do not reserve more memory than they need to. On the other hand, Hash tables can be a bit more demanding if we don't know the exact number of elements we want to store.

Why does HashMap use linked list?

That's because it has an array and linked list implemented in it (Duh!). The array is used to store the hash of the key and the linked list is used to store the data and the key and other stuff.

Do Hashmaps use linked lists?

It actually uses a singly linked list implemented by chaining the hash table entries. (By contrast, a LinkedList is doubly linked, and it requires a separate Node object for each element in the list.)


2 Answers

This is mostly security-related change. While in normal situation it's rarely possible to have many collisions, if hash keys arrive from untrusted source (e.g. HTTP header names received from the client), then it's possible and not very hard to specially craft the input, so the resulting keys will have the same hashcode. Now if you perform many look-ups, you may experience denial-of-service. It appears that there's quite a lot of code in the wild which is vulnerable to this kind of attacks, thus it was decided to fix this on the Java side.

For more information refer to JEP-180.

like image 126
Tagir Valeev Avatar answered Nov 02 '22 17:11

Tagir Valeev


Your question contains some wrong premises.

  1. A bucket collision is not necessarily a hash collision. You don’t need to have the same hash code for two objects to end up at the same bucket. The bucket is an element of an array and the hash code has to be mapped to a particular index. Since the array size should be reasonable in respect to the Map’s size, you can’t raise the array size arbitrarily to avoid bucket collisions. There’s even the theoretical limitation that an array size can be at max 2³¹ while there are 2³² possible hash codes.
  2. Having a hash collision is not a sign of bad programming. For all objects having a value space larger than 2³², the possibility of distinct objects with the same hash code is unavoidable. Strings are an obvious example, but even a Point bearing two int values or a plain Long key have unavoidable hash collisions. So they might be more common than you think and it heavily depends on the use case.
  3. The implementation switches to a binary tree only when the number of collisions in a bucket exceeds a certain threshold so the higher memory costs only apply when they will pay off. there seems to be a common misunderstanding regarding, how they work. Since bucket collision are not necessarily hash collisions, the binary search will first search the hash codes. Only when the hash codes are identical and the key appropriately implements Comparable, its natural order will be used. The examples you might have found on the web deliberately use the same hash code for objects to demonstrate the use of the Comparable implementation which otherwise would not show up. What they trigger is only the last resort of the implementation.
  4. As Tagir pointed out, this issue might affect the security of a software as a slow fallback may open the possibility of DoS attacks. There have been several attempts in previous JRE versions to address this issue which had more drawbacks than the memory consumption of a binary tree. For example, there was an attempt to randomize the mapping of hash codes to array entries in Java 7 which caused an initialization overhead as documented in this bug report. This is not the case with this new implementation.
like image 21
Holger Avatar answered Nov 02 '22 17:11

Holger