Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is primary and secondary clustering in hash?

I am confused for the last few days in finding the difference between primary and secondary clustering in hash collision management topic in the textbook I am reading.

like image 311
Rickx Avatar asked Jan 02 '15 12:01

Rickx


People also ask

What is meant by primary clustering in hash tables?

In computer programming, primary clustering is one of two major failure modes of open addressing based hash tables, especially those using linear probing.

What do you mean by secondary clustering?

(definition) Definition: The tendency for some collision resolution schemes to create long run of filled slots away from a key hash position, e.g., along the probe sequence.

What is a secondary hash function?

A normal hashing process consists of a hash function taking a key and producing the hash table index for that key. In double hashing, there are two hash functions. The second hash function is used to provide an offset value in case the first function causes a collision.

How do you fix secondary clustering?

To avoid secondary clustering, we need to have the probe sequence make use of the original key value in its decision-making process. A simple technique for doing this is to return to linear probing by a constant step size for the probe function, but to have that constant be determined by a second hash function, h2.


2 Answers

Primary Clustering

  1. Primary clustering is the tendency for a collision resolution scheme such as linear probing to create long runs of filled slots near the hash position of keys.
  2. If the primary hash index is x, subsequent probes go to x+1, x+2, x+3 and so on, this results in Primary Clustering.
  3. Once the primary cluster forms, the bigger the cluster gets, the faster it grows. And it reduces the performance.

enter image description here


Secondary Clustering

  1. Secondary clustering is the tendency for a collision resolution scheme such as quadratic probing to create long runs of filled slots away from the hash position of keys.
  2. If the primary hash index is x, probes go to x+1, x+4, x+9, x+16, x+25 and so on, this results in Secondary Clustering.
  3. Secondary clustering is less severe in terms of performance hit than primary clustering, and is an attempt to keep clusters from forming by using Quadratic Probing. The idea is to probe more widely separated cells, instead of those adjacent to the primary hash site.

enter image description here

like image 199
Yogesh Umesh Vaity Avatar answered Sep 20 '22 16:09

Yogesh Umesh Vaity


Primary clustering means that if there is a cluster and the initial position of a new record would fall anywhere in the cluster the cluster size increases. Linear probing leads to this type of clustering.

Secondary clustering is less severe, two records do only have the same collision chain if their initial position is the same. For example quadratic probing leads to this type of clustering.

like image 40
Henry Avatar answered Sep 21 '22 16:09

Henry