I am confused for the last few days in finding the difference between primary and secondary clustering in hash collision management topic in the textbook I am reading.
In computer programming, primary clustering is one of two major failure modes of open addressing based hash tables, especially those using linear probing.
(definition) Definition: The tendency for some collision resolution schemes to create long run of filled slots away from a key hash position, e.g., along the probe sequence.
A normal hashing process consists of a hash function taking a key and producing the hash table index for that key. In double hashing, there are two hash functions. The second hash function is used to provide an offset value in case the first function causes a collision.
To avoid secondary clustering, we need to have the probe sequence make use of the original key value in its decision-making process. A simple technique for doing this is to return to linear probing by a constant step size for the probe function, but to have that constant be determined by a second hash function, h2.
x
, subsequent probes go to x+1
, x+2
, x+3
and so on, this results in Primary Clustering.x
, probes go to x+1
, x+4
, x+9
, x+16,
x+25
and so on, this results in Secondary Clustering.Primary clustering means that if there is a cluster and the initial position of a new record would fall anywhere in the cluster the cluster size increases. Linear probing leads to this type of clustering.
Secondary clustering is less severe, two records do only have the same collision chain if their initial position is the same. For example quadratic probing leads to this type of clustering.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With