Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql hamming distance between two phash

I have a table A which has a column 'template_phash'. I store the phash generated from 400K images.

Now I take a random image and generate a phash from that image.

Now how do I query so that I can get the record from table A which hamming distance difference is less than a threshold value, say 20.

I have seen Hamming distance on binary strings in SQL, but couldn't figure it out.

I think I figured out that I need to make a function to achieve this but how?

Both of my phash are in BigInt eg: 7641692061273169067

Please help me make the function so that I could query like

SELECT product_id, HAMMING_DISTANCE(phash1,  phash2) as hd 
FROM A 
WHERE hd < 20 ORDER BY hd ASC;
like image 427
Gagan Avatar asked Jan 10 '14 06:01

Gagan


People also ask

How do you find the Hamming distance between two points?

To calculate the Hamming distance, you simply count the number of bits where two same-length messages differ. An example of Hamming distance 1 is the distance between 1101 and 1001 . If you increase the distance to 2 , we can give as an example 1001 and 1010 .

What is the Hamming distance between two strings?

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

What is Hamming distance between two vectors?

Definition 1 (Hamming distance) Given two vectors u,v ∈ Fn we define the hamming distance between u and v, d(u,v), to be the number of places where u and v differ. Thus the Hamming distance between two vectors is the number of bits we must change to change one into the other.

What is Hamming distance between two binary numbers?

Hamming distance is a metric for comparing two binary data strings. While comparing two binary strings of equal length, Hamming distance is the number of bit positions in which the two bits are different. The Hamming distance between two strings, a and b is denoted as d(a,b).


1 Answers

I figured out that the hamming distance is just the count of different bits between the two hashes. First xor the two hashes then get the count of binary ones:

SELECT product_id, BIT_COUNT(phash1 ^ phash2) as hd from A ORDER BY hd ASC;
like image 185
Gagan Avatar answered Sep 27 '22 17:09

Gagan