Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java:Internal data structure in Map

I am trying to create a class that implements the Map interface. So I am writing code that will check if the calling object is empty or not. However I am a little confused as to which data structure I should use internally. At present I am using a Hash Table. Thanks in advance

like image 483
Rohan Avatar asked Feb 27 '23 16:02

Rohan


2 Answers

From Wikipedia,

Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).

Efficient representations:
There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree (such as a red-black tree or an AVL tree). Skip lists are also an alternative, though relatively new and not as widely used. B-trees (and variants) can also be used, and are commonly used when the associative array is too large to reside entirely in memory, for instance in a simple database. Relative advantages and disadvantages include:

  1. Asymptotic operation performance: Hash tables have faster average lookup and insertion time, O(1), compared to a balanced binary search tree's Θ(log n), while balanced trees have faster worst-case lookup and insertion time, O(log n) as compared to Θ(n). Skip lists have O(n) worst-case and O(log n) average-case operation times, but with less insertion and deletion overhead in practice than balanced binary trees.

  2. Ordering preservation: Balanced binary trees and skip lists preserve ordering — allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently (they require the data to be sorted in a separate step).

  3. Range queries: Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range. (With n elements in the array and performing the operation on a contiguous range of m keys, a balanced binary tree will take O(log(n)+m) time while a hash table would need Θ(n) time as it needs to search the entire table.)

  4. Allocation behavior: Hash tables with open addressing store all data in a large contiguous block of memory that is reallocated infrequently, whereas tree allocations perform many small, frequent allocations. As a result hash tables may be difficult to allocate in a fragmented heap, and conversely trees may cause heap fragmentation. Trees are also more vulnerable to inefficiencies in allocators.

  5. Compactness: Hash tables can have more compact storage for small value types, especially when the values are bits.

  6. Persistence: There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.

  7. Supporting new key types: Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys.

Sometimes simple implementations of one data structure or the other have disadvantages that can be overcome by better design. For example:

  • Hash tables that use untrusted input as keys may be vulnerable to denial-of-service attacks where an untrusted user supplies data intended to generate large numbers of collisions. This may be overcome by choosing hash functions at random from a universal family, or by hashing untrusted input with a cryptographic hash function before insertion.

  • Simple balanced trees waste space on pointers and allocation metadata; these problems can be mitigated by storing multiple elements in each node and by using memory pools.

like image 134
missingfaktor Avatar answered Mar 01 '23 11:03

missingfaktor


In addition to the table itself you could also maintain an integer member variable to track the size of the collection, incrementing it each time a new mapping is put and decrementing each time one is removed. This way, you can simplify the size and isEmpty interface methods:

public int size() {
    return this.size;
}

public boolean isEmpty() {
    return this.size == 0;
}
like image 29
maerics Avatar answered Mar 01 '23 11:03

maerics