I have some hashes stored in mysql, which I would fetch with comparison by hamming distance.
Hashes stored are these:
qw 1 ffe71b001820a1fd
qw 2 ffffb81c1c3838a0
qw 3 fff8381c1c3e3828
qw 4 fffa181c3c2e3920
qw 5 fffa981c1c3e2820
qw 6 ff5f1c38387c1c04
qw 7 fff1e0c1c38387ef
qw 8 fffa181c1c3e3820
qw 9 fffa381c1c3e3828
I normally fetch like:
SELECT product_id, HAMMING_DISTANCE(phash, 'phashfromuserinput') ;
But in mysql hamming distance is bitwise operator which I can do if strings were only numbers:
SELECT pagedata,BIT_COUNT(pagecontent^'$encrypted')searchengine WHERE pagecontent > 2 ; ")
It only works in integer (number) but my requirement is work with numbers and alphabets, for example:
74898fababfbef46 and 95efabfeba752545
From my little research I know that first I have to convert field to binary
and then use bitcount
by using CAST
or CONVERT
like:
SELECT BIT_COUNT( CONV( hash, 2, 10 ) ^
0b0000000101100111111100011110000011100000111100011011111110011011 )
or
SELECT BIT_COUNT(CAST(hash AS BINARY)) FROM data;
This is ok as converting data to binary
and using bitcount
. Now question arises that varbinary
characters/hashes stored in mysql
already are alphanumeric and if I convert field to varbinary
and bitcount
then it will not work as stored hashes are not binary strings.
What should I do?
I was refering as php hamming distance matching example of:
function HammingDistance($bin1, $bin2) {
$a1 = str_split($bin1);
$a2 = str_split($bin2);
$dh = 0;
for ($i = 0; $i < count($a1); $i++)
if($a1[$i] != $a2[$i]) $dh++;
return $dh;
}
echo HammingDistance('10101010','01010101'); //returns 8
But I'm not understanding how to match with mysql and fetch, because I can't implement it in mysql.
Using the last two numbers as an example:
SELECT BIT_COUNT( CAST(CONV('fffa181c1c3e3820', 16, 10) AS UNSIGNED) ^
CAST(CONV('fffa381c1c3e3828', 16, 10) AS UNSIGNED) ) ;
--> 2
BIGINT UNSIGNED
.(If you had had MD5 (128-bit) or SHA1 (160-bit) hashes, we would have had to split them via SUBSTR()
, Xor each pair, BIT_COUNT, then added the results.)
Edit to use column name:
SELECT BIT_COUNT( CAST(CONV( a.pagecontent , 16, 10) AS UNSIGNED) ^
CAST(CONV( b.pagecontent , 16, 10) AS UNSIGNED) ) ;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With