I was wondering what was the complexity of the replace(Key , Value)
for a HashMap
is.
My initial thoughts are O(1)
since it's O(1)
to get the value and I can simply replace the value assigned to the key.
I'm unsure as to if I should take into account collisions that there might be in a large hashmap implemented in java with java.util
.
Hashmap put and get operation time complexity is O(1) with assumption that key-value pairs are well distributed across the buckets.
It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions. Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.
Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using.
On an average the time complexity of a HashMap insertion, deletion, the search takes O(1) constant time. That said, in the worst case, java takes O(n) time for searching, insertion, and deletion. Java uses chaining and rehashing to handle collisions.
Because the time complexity of the equals method is O (n). However, HashMap has a hash collision problem. In the worst case, all keys are allocated to the same bucket. At this time, the time complexity of map put and get is O (n). Therefore, a problem that HashMap designers must consider is to reduce hash collisions.
What is the complexity of containsKey () in HashMap? Generally if there is no collision in the hashing value of the key then the complexity of the the containskey is O (1). The complexity can be understood by seeing how the method has been implemented. The Hashmap contains array of nodes.
A HashMap in java is an implementation of the HashTable data structure. The purpose of HashTable data structure is to get worst case run-time complexity of O (1) i.e. constant running time for commonly used operations like put () and get () However, the run-time complexity of O (1) is not always possible for get () due to Hash Collisions.
In all cases of open addressing, space complexity for hash map remains O (n) where since every element which is stored in the table must have some memory associated with it, no matter the case. where m is the size of the hash table and n is the number of items inserted.
HashMap#replace
runs in O(1)
amortized;
and under the premise that the map is properly balanced, which Java takes care of during your put
and remove
calls, also non-amortized.
The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism.
Basically, due to replace
only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure.
Hence we only pay for the cost of locating the key
, which depends on the bucket size.
The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to O(1)
for replace
also non-amortized.
However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex.
So the reality is probably somewhere in between due to the heuristics.
To be sure, let us take a look at the current implementation (Java 16):
@Override
public V replace(K key, V value) {
Node<K,V> e;
if ((e = getNode(key)) != null) {
V oldValue = e.value;
e.value = value;
afterNodeAccess(e);
return oldValue;
}
return null;
}
The method afterNodeAccess
is a dummy for subclasses and is empty in HashMap
.
Everything except getNode
runs in O(1)
trivially.
getNode
getNode
is the canonical implementation of locating an entry in a HashMap
, which we know runs in O(1)
for a proper self-balancing map, like Javas implementation. Let's take a look at the code:
/**
* Implements Map.get and related methods.
*
* @param key the key
* @return the node, or null if none
*/
final Node<K,V> getNode(Object key) {
Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
if ((tab = table) != null && (n = tab.length) > 0 &&
(first = tab[(n - 1) & (hash = hash(key))]) != null) {
if (first.hash == hash && // always check first node
((k = first.key) == key || (key != null && key.equals(k))))
return first;
if ((e = first.next) != null) {
if (first instanceof TreeNode)
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
}
}
return null;
}
This method basically computes the hash hash = hash(key)
, then looks up the hash in the table
first = tab[(n - 1) & (hash = hash(key))]
and starts iterating through the data structure stored in the bucket.
Regarding the data structure for the bucket, we have a little branching going on at if (first instanceof TreeNode)
.
The buckets are either simple implicitly linked lists or red-black-tree.
For the linked list, we have a straightforward iteration
do {
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
return e;
} while ((e = e.next) != null);
which obviously runs in O(m)
with m
being the size of the linked list.
For the red-black-tree, we have
return ((TreeNode<K,V>)first).getTreeNode(hash, key);
Lookup in a red-black-tree is O(log m)
, with m
being the size of the tree.
Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like put
or remove
).
So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant.
Having the buckets at constant size, effectively, makes getNode
run in O(1)
, leading to replace
running in O(1)
as well.
Without any self-balancing mechanism, in worst case it would degrade to O(n)
if a linked list is used and O(log n)
for a red-black-tree (for the case that all keys yield a hash collision).
Feel free to dig deeper into the code but it gets a bit more complex down there.
You are right, the main cost is the lookup, which is amortized O(1).
Replacing the associated value with the new one is O(1) once we have found the correct spot. But the lookup is only amortized O(1).
As shown in the code accompanying the incorrect answer of Zabuzard, Java HashMap uses a classical approach, where if you are lucky (the entry you are looking for is the first in the bucket) you get O(1).
If you are less lucky or you have a poor quality hash function (just suppose the worst case, all elements map to the same hash key), to avoid meeting the dreaded O(n) of iterating a plain linked list in the bucket, Java's implementation uses a TreeMap to provide O(log n) complexity.
So Java's hashmap if used correctly should yield basically O(1) replace, and if used incorrectly will degrade gracefully to O(log n) complexity. The threshold is in the TREEIFY (e.g. value is 8 in modern implementation).
Please have a look at these implementation notes in the source: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/HashMap.java#L143-L231
The basics:
Node
and TreeNode
) inside bucketsIn one replace/contains/put/get operation, bucket collisions,
Furthermore, worst case, hash collisions:
Node
implementation functions like a LinkedList
LinkedList
-like search) O(n/2) = O(n) complexity.hashCode()
sAlso, check out the comments and the code yourself: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java
tableSizeFor(int cap)
getNode()
Specifically:
2^n - 1
first = tab[(n - 1) & hash])
with 'first' being the bucketTo illustrate how to research this yourself, I wrote some code showing worst-case (hash collision) behaviour:
import java.util.HashMap;
public class TestHashMapCollisions {
static class C {
private final String mName;
public C(final String pName) {
mName = pName;
}
@Override public int hashCode() {
return 1;
}
@Override public boolean equals(final Object obj) {
if (this == obj) return true;
if (obj == null) return false;
if (getClass() != obj.getClass()) return false;
final C other = (C) obj;
if (mName == null) {
if (other.mName != null) return false;
} else if (!mName.equals(other.mName)) return false;
return true;
}
}
public static void main(final String[] args) {
final HashMap<C, Long> testMap = new HashMap<>();
for (int i = 0; i < 5; i++) {
final String name = "name" + i;
final C c = new C(name);
final Long value = Long.valueOf(i);
testMap.put(c, value);
}
final C c = new C("name2");
System.out.println("Result: " + testMap.get(c));
System.out.println("End.");
}
}
Procedure:
System.out.println("Result: " + testMap.get(c));
HashMap
implementationHashMap.getNode()
(Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
)HashMap
Hint: (You could immediately set the breakpoint inside HashMap, but this would lead to a little chaos, as HashMap
is used quite often when the JVM initializes, so you'll hit a lot of unwanted stops first, before you get to testing your code)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With