Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the complexity of HashMap#replace?

I was wondering what was the complexity of the replace(Key , Value) for a HashMap is.

My initial thoughts are O(1) since it's O(1) to get the value and I can simply replace the value assigned to the key.

I'm unsure as to if I should take into account collisions that there might be in a large hashmap implemented in java with java.util.

like image 412
a_confused_student Avatar asked Aug 25 '21 09:08

a_confused_student


People also ask

What is time complexity of HashMap get?

Hashmap put and get operation time complexity is O(1) with assumption that key-value pairs are well distributed across the buckets.

Is HashMap always O 1?

It is O(1) only if your hashing function is very good. The Java hash table implementation does not protect against bad hash functions. Whether you need to grow the table when you add items or not is not relevant to the question because it is about lookup time.

Why HashMap get is O 1?

Hashtables seem to be O(1) because they have a small constant factor combined with their 'n' in the O(log(n)) being increased to the point that, for many practical applications, it is independent of the number of actual items you are using.

Is HashMap remove constant time?

On an average the time complexity of a HashMap insertion, deletion, the search takes O(1) constant time. That said, in the worst case, java takes O(n) time for searching, insertion, and deletion. Java uses chaining and rehashing to handle collisions.

What is the time complexity of equals method in HashMap?

Because the time complexity of the equals method is O (n). However, HashMap has a hash collision problem. In the worst case, all keys are allocated to the same bucket. At this time, the time complexity of map put and get is O (n). Therefore, a problem that HashMap designers must consider is to reduce hash collisions.

What is the complexity of containskey() in HashMap?

What is the complexity of containsKey () in HashMap? Generally if there is no collision in the hashing value of the key then the complexity of the the containskey is O (1). The complexity can be understood by seeing how the method has been implemented. The Hashmap contains array of nodes.

What is a hashmap in Java?

A HashMap in java is an implementation of the HashTable data structure. The purpose of HashTable data structure is to get worst case run-time complexity of O (1) i.e. constant running time for commonly used operations like put () and get () However, the run-time complexity of O (1) is not always possible for get () due to Hash Collisions.

What is the space complexity of a hash map?

In all cases of open addressing, space complexity for hash map remains O (n) where since every element which is stored in the table must have some memory associated with it, no matter the case. where m is the size of the hash table and n is the number of items inserted.


3 Answers

tl:dr

HashMap#replace runs in O(1) amortized;

and under the premise that the map is properly balanced, which Java takes care of during your put and remove calls, also non-amortized.

Non-amortized

The fact whether it also holds for non-amortized analysis hinges on the question regarding the implemented self-balancing mechanism.

Basically, due to replace only changing the value which does not influence hashing and the general structure of the HashMap, replacing a value will not trigger any re-hashing or re-organization of the internal structure.

Hence we only pay for the cost of locating the key, which depends on the bucket size.

The bucket size, if the map is properly self-balanced, can be considered a constant. Leading to O(1) for replace also non-amortized.

However, the implementation triggers self-balancing and re-hashing based on heuristic factors only. A deep analysis of that is a bit more complex.

So the reality is probably somewhere in between due to the heuristics.


Implementation

To be sure, let us take a look at the current implementation (Java 16):

@Override
public V replace(K key, V value) {
    Node<K,V> e;
    if ((e = getNode(key)) != null) {
        V oldValue = e.value;
        e.value = value;
        afterNodeAccess(e);
        return oldValue;
    }
    return null;
}

The method afterNodeAccess is a dummy for subclasses and is empty in HashMap. Everything except getNode runs in O(1) trivially.

getNode

getNode is the canonical implementation of locating an entry in a HashMap, which we know runs in O(1) for a proper self-balancing map, like Javas implementation. Let's take a look at the code:

/**
 * Implements Map.get and related methods.
 *
 * @param key the key
 * @return the node, or null if none
 */
final Node<K,V> getNode(Object key) {
    Node<K,V>[] tab; Node<K,V> first, e; int n, hash; K k;
    if ((tab = table) != null && (n = tab.length) > 0 &&
        (first = tab[(n - 1) & (hash = hash(key))]) != null) {
        if (first.hash == hash && // always check first node
            ((k = first.key) == key || (key != null && key.equals(k))))
            return first;
        if ((e = first.next) != null) {
            if (first instanceof TreeNode)
                return ((TreeNode<K,V>)first).getTreeNode(hash, key);
            do {
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k))))
                    return e;
            } while ((e = e.next) != null);
        }
    }
    return null;
}

This method basically computes the hash hash = hash(key), then looks up the hash in the table first = tab[(n - 1) & (hash = hash(key))] and starts iterating through the data structure stored in the bucket.

Regarding the data structure for the bucket, we have a little branching going on at if (first instanceof TreeNode).

Bucket

The buckets are either simple implicitly linked lists or red-black-tree.

Linked List

For the linked list, we have a straightforward iteration

do {
     if (e.hash == hash &&
        ((k = e.key) == key || (key != null && key.equals(k))))
        return e;
} while ((e = e.next) != null);

which obviously runs in O(m) with m being the size of the linked list.

Red-Black-Tree

For the red-black-tree, we have

return ((TreeNode<K,V>)first).getTreeNode(hash, key);

Lookup in a red-black-tree is O(log m), with m being the size of the tree.

Bucket size

Javas implementation makes sure to re-balance the buckets by rehashing if it detects that it gets out of hands (you pay for that on each modifying method like put or remove).

So in both cases we can consider the size of the buckets as constant or, due to the heuristics involved with self-balancing, close to a constant.

Conclusion

Having the buckets at constant size, effectively, makes getNode run in O(1), leading to replace running in O(1) as well.

Without any self-balancing mechanism, in worst case it would degrade to O(n) if a linked list is used and O(log n) for a red-black-tree (for the case that all keys yield a hash collision).

Feel free to dig deeper into the code but it gets a bit more complex down there.

like image 88
Zabuzard Avatar answered Oct 20 '22 14:10

Zabuzard


You are right, the main cost is the lookup, which is amortized O(1).

Replacing the associated value with the new one is O(1) once we have found the correct spot. But the lookup is only amortized O(1).

As shown in the code accompanying the incorrect answer of Zabuzard, Java HashMap uses a classical approach, where if you are lucky (the entry you are looking for is the first in the bucket) you get O(1).

If you are less lucky or you have a poor quality hash function (just suppose the worst case, all elements map to the same hash key), to avoid meeting the dreaded O(n) of iterating a plain linked list in the bucket, Java's implementation uses a TreeMap to provide O(log n) complexity.

So Java's hashmap if used correctly should yield basically O(1) replace, and if used incorrectly will degrade gracefully to O(log n) complexity. The threshold is in the TREEIFY (e.g. value is 8 in modern implementation).

Please have a look at these implementation notes in the source: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/HashMap.java#L143-L231

like image 21
Yann TM Avatar answered Oct 20 '22 13:10

Yann TM


The basics:

  • java.util.HashMap will resize itself to match given amount of elements
  • so collisions are quite rare (compared to n)
  • (for collisions,) modern HashMap implementations use Trees (Node and TreeNode) inside buckets

In one replace/contains/put/get operation, bucket collisions,

  • if you have k bucket collisions out of n, that's k,
  • which with the tree search gets reduced to O(log2(k))
  • which in the O notation, with k being a small number, is equivalent to O(1).

Furthermore, worst case, hash collisions:

  • if you have a really had hash generator that always gives the same result
  • so we get hash collisions
  • for hash collisions, the Node implementation functions like a LinkedList
  • you would have (with this LinkedList-like search) O(n/2) = O(n) complexity.
  • but this would have to be made on purpose, because
  • the primary factor distribution and the primary number modulo get really good distributions,as long as you don't have too many identical hashCode()s
  • most IDEs or simple id sequencing (like primary keys in databases) will provide a near perfect distribution
    • with an id-sequenced hash function, you will not have any (hash or bucket) collisions, so you could actually just use array indexing instead of hash functions and collision handling

Also, check out the comments and the code yourself: https://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/HashMap.java

  • tableSizeFor(int cap)
  • getNode()

Specifically:

  • setting the table size for the bucket array is getting close to using prime numbers, which is basically 2^n - 1
  • getting the bucket is first = tab[(n - 1) & hash]) with 'first' being the bucket
    • which is NOT, as I said, a modulo operation, but simply a bit-wise AND,
      • which is faster,
      • can use more valid bits
      • and produces comparably distributed results

To illustrate how to research this yourself, I wrote some code showing worst-case (hash collision) behaviour:

import java.util.HashMap;

public class TestHashMapCollisions {

    static class C {
        private final String mName;

        public C(final String pName) {
            mName = pName;
        }

        @Override public int hashCode() {
            return 1;
        }
        @Override public boolean equals(final Object obj) {
            if (this == obj) return true;
            if (obj == null) return false;
            if (getClass() != obj.getClass()) return false;
            final C other = (C) obj;
            if (mName == null) {
                if (other.mName != null) return false;
            } else if (!mName.equals(other.mName)) return false;
            return true;
        }
    }


    public static void main(final String[] args) {
        final HashMap<C, Long> testMap = new HashMap<>();
        for (int i = 0; i < 5; i++) {
            final String name = "name" + i;
            final C c = new C(name);
            final Long value = Long.valueOf(i);
            testMap.put(c, value);
        }

        final C c = new C("name2");
        System.out.println("Result: " + testMap.get(c));
        System.out.println("End.");
    }
}

Procedure:

  • use an IDE
  • link the source code of the JDR/JRE you're using to your IDE
  • set a breakpoint to the line System.out.println("Result: " + testMap.get(c));
  • run in debug
  • debugger halts at the breakpoint
  • now go into the HashMap implementation
  • set a breakpoint to the first line of HashMap.getNode() (Node<K,V>[] tab; Node<K,V> first, e; int n; K k; )
  • resume debug; debugger will halt inside HashMap
  • now you can follow the debugger step-by-step

Hint: (You could immediately set the breakpoint inside HashMap, but this would lead to a little chaos, as HashMap is used quite often when the JVM initializes, so you'll hit a lot of unwanted stops first, before you get to testing your code)

like image 25
JayC667 Avatar answered Oct 20 '22 13:10

JayC667