I was wondering what was the complexity of the <code>replace(Key , Value)</code> for a <code>HashMap</code> is. My initial thoughts are <code>O(1)</code> since it's <code>O(1)</code> to get the value and I can simply replace the value assigned to the key. I'm unsure as to if I should take into account collisions that there might be in a large hashmap implemented in java with <code>java.util</code>.

<h3>tl:dr</h3> <code>HashMap#replace</code> runs in <code>O(1)</code> amortized; and under the premise that the map is properly balanced, which Java takes care of during your <code>put</code> and <code>remove</code> calls, also non-amortized. <h3>Non-amortized</h3> The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism. Basically, due to <code>replace</code> only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure. Hence we only pay for the cost of locating the <code>key</code>, which depends on the bucket size. The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to <code>O(1)</code> for <code>replace</code> also non-amortized. However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex. So the reality is probably somewhere in between due to the heuristics. <hr> <h3>Implementation</h3> To be sure, let us take a look at the current implementation (Java 16): <pre class="prettyprint"><code>@Override public V replace(K key, V value) { Node<K,V> e; if ((e = getNode(key)) != null) { V oldValue = e.value; e.value = value; afterNodeAccess(e); return oldValue; } return null; } </code></pre> The method <code>afterNodeAccess</code> is a dummy for subclasses and is empty in <code>HashMap</code>. Everything except <code>getNode</code> runs in <code>O(1)</code> trivially. <h3><code>getNode</code></h3> <code>getNode</code> is the canonical implementation of locating an entry in a <code>HashMap</code>, which we know runs in <code>O(1)</code> for a proper self-balancing map, like Javas implementation. Let's take a look at the code: <pre class="prettyprint"><code>/** * Implements Map.get and related methods. * * @param key the key * @return the node, or null if none */ final Node<K,V> getNode(Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & (hash = hash(key))]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; } </code></pre> This method basically computes the hash <code>hash = hash(key)</code>, then looks up the hash in the <code>table</code> <code>first = tab[(n - 1) & (hash = hash(key))]</code> and starts iterating through the data structure stored in the bucket. Regarding the data structure for the bucket, we have a little branching going on at <code>if (first instanceof TreeNode)</code>. <h3>Bucket</h3> The buckets are either simple implicitly linked lists or red-black-tree. <h4>Linked List</h4> For the linked list, we have a straightforward iteration <pre class="prettyprint"><code>do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); </code></pre> which obviously runs in <code>O(m)</code> with <code>m</code> being the size of the linked list. <h4>Red-Black-Tree</h4> For the red-black-tree, we have <pre class="prettyprint"><code>return ((TreeNode<K,V>)first).getTreeNode(hash, key); </code></pre> Lookup in a red-black-tree is <code>O(log m)</code>, with <code>m</code> being the size of the tree. <h4>Bucket size</h4> Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like <code>put</code> or <code>remove</code>). So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant. <h3>Conclusion</h3> Having the buckets at constant size, effectively, makes <code>getNode</code> run in <code>O(1)</code>, leading to <code>replace</code> running in <code>O(1)</code> as well. Without any self-balancing mechanism, in worst case it would degrade to <code>O(n)</code> if a linked list is used and <code>O(log n)</code> for a red-black-tree (for the case that all keys yield a hash collision). Feel free to dig deeper into the code but it gets a bit more complex down there.

The basics: <ul> <li>java.util.HashMap will resize itself to match given amount of elements</li> <li>so collisions are quite rare (compared to n)</li> <li>(for collisions,) modern HashMap implementations use Trees (<code>Node</code> and <code>TreeNode</code>) inside buckets</li> </ul> In one replace/contains/put/get operation, bucket collisions, <ul> <li>if you have k bucket collisions out of n, that's k,</li> <li>which with the tree search gets reduced to O(log2(k))</li> <li>which in the O notation, with k being a small number, is equivalent to O(1).</li> </ul> Furthermore, worst case, hash collisions: <ul> <li>if you have a really had hash generator that always gives the same result</li> <li>so we get hash collisions </li> <li>for hash collisions, the <code>Node</code> implementation functions like a <code>LinkedList</code> </li> <li>you would have (with this <code>LinkedList</code>-like search) O(n/2) = O(n) complexity.</li> <li>but this would have to be made on purpose, because</li> <li>the primary factor distribution and the primary number modulo get really good distributions,as long as you don't have too many identical <code>hashCode()</code>s</li> <li>most IDEs or simple id sequencing (like primary keys in databases) will provide a near perfect distribution</li> <li> <ul> <li>with an id-sequenced hash function, you will not have any (hash or bucket) collisions, so you could actually just use array indexing instead of hash functions and collision handling</li> </ul> </li> </ul> Also, check out the comments and the code yourself: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java <ul> <li><code>tableSizeFor(int cap)</code></li> <li><code>getNode()</code></li> </ul> Specifically: <ul> <li>setting the table size for the bucket array is getting close to using prime numbers, which is basically <code>2^n - 1</code> </li> <li>getting the bucket is <code>first = tab[(n - 1) & hash])</code> with 'first' being the bucket</li> <li> <ul> <li>which is NOT, as I said, a modulo operation, but simply a bit-wise AND,</li> </ul> </li> <li> <ul> <li> <ul> <li>which is faster,</li> </ul> </li> </ul> </li> <li> <ul> <li> <ul> <li>can use more valid bits</li> </ul> </li> </ul> </li> <li> <ul> <li> <ul> <li>and produces comparably distributed results</li> </ul> </li> </ul> </li> </ul> To illustrate how to research this yourself, I wrote some code showing worst-case (hash collision) behaviour: <pre class="prettyprint"><code>import java.util.HashMap; public class TestHashMapCollisions { static class C { private final String mName; public C(final String pName) { mName = pName; } @Override public int hashCode() { return 1; } @Override public boolean equals(final Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() != obj.getClass()) return false; final C other = (C) obj; if (mName == null) { if (other.mName != null) return false; } else if (!mName.equals(other.mName)) return false; return true; } } public static void main(final String[] args) { final HashMap<C, Long> testMap = new HashMap<>(); for (int i = 0; i < 5; i++) { final String name = "name" + i; final C c = new C(name); final Long value = Long.valueOf(i); testMap.put(c, value); } final C c = new C("name2"); System.out.println("Result: " + testMap.get(c)); System.out.println("End."); } } </code></pre> Procedure: <ul> <li>use an IDE</li> <li>link the source code of the JDR/JRE you're using to your IDE</li> <li>set a breakpoint to the line <code>System.out.println("Result: " + testMap.get(c));</code> </li> <li>run in debug</li> <li>debugger halts at the breakpoint</li> <li>now go into the <code>HashMap</code> implementation</li> <li>set a breakpoint to the first line of <code>HashMap.getNode()</code> (<code>Node<K,V>[] tab; Node<K,V> first, e; int n; K k; </code>)</li> <li>resume debug; debugger will halt inside <code>HashMap</code> </li> <li>now you can follow the debugger step-by-step</li> </ul> Hint: (You could immediately set the breakpoint inside HashMap, but this would lead to a little chaos, as <code>HashMap</code> is used quite often when the JVM initializes, so you'll hit a lot of unwanted stops first, before you get to testing your code)

What is the complexity of HashMap#replace?

3 Answers

tl:dr

HashMap#replace runs in O(1) amortized;

and under the premise that the map is properly balanced, which Java takes care of during your put and remove calls, also non-amortized.

Non-amortized

The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism.

Basically, due to replace only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure.

Hence we only pay for the cost of locating the key, which depends on the bucket size.

The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to O(1) for replace also non-amortized.

However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex.

So the reality is probably somewhere in between due to the heuristics.

Implementation

To be sure, let us take a look at the current implementation (Java 16):

@Override
public V replace(K key, V value) {
    Node<K,V> e;
    if ((e = getNode(key)) != null) {
        V oldValue = e.value;
        e.value = value;
        afterNodeAccess(e);
        return oldValue;
    }
    return null;
}

The method afterNodeAccess is a dummy for subclasses and is empty in HashMap. Everything except getNode runs in O(1) trivially.

`getNode`

getNode is the canonical implementation of locating an entry in a HashMap, which we know runs in O(1) for a proper self-balancing map, like Javas implementation. Let's take a look at the code:

/**
 * Implements Map.get and related methods.
 *
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & (hash = hash(key))]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

This method basically computes the hash hash = hash(key), then looks up the hash in the table first = tab[(n - 1) & (hash = hash(key))] and starts iterating through the data structure stored in the bucket.

Regarding the data structure for the bucket, we have a little branching going on at if (first instanceof TreeNode).

Bucket

The buckets are either simple implicitly linked lists or red-black-tree.

Linked List

For the linked list, we have a straightforward iteration

do {
     if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
} while ((e = e.next) != null);

which obviously runs in O(m) with m being the size of the linked list.

Red-Black-Tree

For the red-black-tree, we have

return ((TreeNode<K,V>)first).getTreeNode(hash, key);

Lookup in a red-black-tree is O(log m), with m being the size of the tree.

Bucket size

Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like put or remove).

So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant.

Conclusion

Having the buckets at constant size, effectively, makes getNode run in O(1), leading to replace running in O(1) as well.

Without any self-balancing mechanism, in worst case it would degrade to O(n) if a linked list is used and O(log n) for a red-black-tree (for the case that all keys yield a hash collision).

Feel free to dig deeper into the code but it gets a bit more complex down there.

answered Oct 20 '22 14:10

Zabuzard

You are right, the main cost is the lookup, which is amortized O(1).

Replacing the associated value with the new one is O(1) once we have found the correct spot. But the lookup is only amortized O(1).

As shown in the code accompanying the incorrect answer of Zabuzard, Java HashMap uses a classical approach, where if you are lucky (the entry you are looking for is the first in the bucket) you get O(1).

If you are less lucky or you have a poor quality hash function (just suppose the worst case, all elements map to the same hash key), to avoid meeting the dreaded O(n) of iterating a plain linked list in the bucket, Java's implementation uses a TreeMap to provide O(log n) complexity.

So Java's hashmap if used correctly should yield basically O(1) replace, and if used incorrectly will degrade gracefully to O(log n) complexity. The threshold is in the TREEIFY (e.g. value is 8 in modern implementation).

Please have a look at these implementation notes in the source: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/HashMap.java#L143-L231

answered Oct 20 '22 13:10

Yann TM

The basics:

java.util.HashMap will resize itself to match given amount of elements
so collisions are quite rare (compared to n)
(for collisions,) modern HashMap implementations use Trees (Node and TreeNode) inside buckets

In one replace/contains/put/get operation, bucket collisions,

if you have k bucket collisions out of n, that's k,
which with the tree search gets reduced to O(log2(k))
which in the O notation, with k being a small number, is equivalent to O(1).

Furthermore, worst case, hash collisions:

if you have a really had hash generator that always gives the same result
so we get hash collisions
for hash collisions, the Node implementation functions like a LinkedList
you would have (with this LinkedList-like search) O(n/2) = O(n) complexity.
but this would have to be made on purpose, because
the primary factor distribution and the primary number modulo get really good distributions,as long as you don't have too many identical hashCode()s
most IDEs or simple id sequencing (like primary keys in databases) will provide a near perfect distribution
- with an id-sequenced hash function, you will not have any (hash or bucket) collisions, so you could actually just use array indexing instead of hash functions and collision handling

Also, check out the comments and the code yourself: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

tableSizeFor(int cap)
getNode()

Specifically:

setting the table size for the bucket array is getting close to using prime numbers, which is basically 2^n - 1
getting the bucket is first = tab[(n - 1) & hash]) with 'first' being the bucket
- which is NOT, as I said, a modulo operation, but simply a bit-wise AND,
- - which is faster,
- - can use more valid bits
- - and produces comparably distributed results

To illustrate how to research this yourself, I wrote some code showing worst-case (hash collision) behaviour:

import java.util.HashMap;

public class TestHashMapCollisions {

    static class C {
        private final String mName;

        public C(final String pName) {
            mName = pName;
        }

        @Override public int hashCode() {
            return 1;
        }
        @Override public boolean equals(final Object obj) {
            if (this == obj) return true;
            if (obj == null) return false;
            if (getClass() != obj.getClass()) return false;
            final C other = (C) obj;
            if (mName == null) {
                if (other.mName != null) return false;
            } else if (!mName.equals(other.mName)) return false;
            return true;
        }
    }


    public static void main(final String[] args) {
        final HashMap<C, Long> testMap = new HashMap<>();
        for (int i = 0; i < 5; i++) {
            final String name = "name" + i;
            final C c = new C(name);
            final Long value = Long.valueOf(i);
            testMap.put(c, value);
        }

        final C c = new C("name2");
        System.out.println("Result: " + testMap.get(c));
        System.out.println("End.");
    }
}

Procedure:

use an IDE
link the source code of the JDR/JRE you're using to your IDE
set a breakpoint to the line System.out.println("Result: " + testMap.get(c));
run in debug
debugger halts at the breakpoint
now go into the HashMap implementation
set a breakpoint to the first line of HashMap.getNode() (Node<K,V>[] tab; Node<K,V> first, e; int n; K k; )
resume debug; debugger will halt inside HashMap
now you can follow the debugger step-by-step

Hint: (You could immediately set the breakpoint inside HashMap, but this would lead to a little chaos, as HashMap is used quite often when the JVM initializes, so you'll hit a lot of unwanted stops first, before you get to testing your code)

answered Oct 20 '22 13:10

JayC667

Related questions
                            
                                compiled class problem in java try/catch block
                            
                                Spring Cloud Contract - is it Consumer Driven?
                            
                                How does elasticsearch store data
                            
                                What is @ControllerAdvice annotation in spring-boot and why we use it and when?
                            
                                ZGC max heap size exceed physical memory
                            
                                Collect results of a map operation in a Map using Collectors.toMap or groupingBy
                            
                                The import io.restassured.RestAssured cannot be resolved
                            
                                com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.util.ArrayList` out of START_OBJECT token
                            
                                Why is CompletableFuture join/get faster in separate streams than using one stream
                            
                                SpringBoot 2.2.1 groovyMarkupConfigurer exception
                            
                                Exception: java.lang.ClassNotFoundException: javax.activation.DataHandler even though javax.mail.jar is under classpath?
                            
                                How to Merge two Immutable Sets in Java.? [duplicate]
                            
                                Create new file in the directory returned by Intent.ACTION_OPEN_DOCUMENT_TREE
                            
                                How does computeIfAbsent fail ConcurrentHashMap randomly?
                            
                                Is there a way to control the order of child entity, when using one-to-many relationhips?
                            
                                String.format() rounding a double with leading zeros in digits - Java
                            
                                javax.net.ssl.SSLProtocolException: The certificate chain length (11) exceeds the maximum allowed length (10)
                            
                                android studio Launching 'app' on No Devices - The app is unable to install
                            
                                Spring Boot and Angular authentication - how to secure the app?
                            
                                ViewBinding with Java11, AndroidStudio always shows error (but runs without any problem)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the complexity of HashMap#replace?

Tags:

java

algorithm

time-complexity

hashmap

a_confused_student

People also ask