I need to search a huge image database to find possible duplicate using pHash assuming those image records have the hash code generated using the pHash.
Now I have to compare a new image and I have to create the hash for this using pHash against existing records. But as per my understanding the has comparison is NOT straight forward like
hash1 - has2 < threshold
Looks like I need to pass the both hash codes into a pHash API to do the matching.So I have to retrieve all hash codes from DB in batches and compare one by one using the pHash API.
But this looks not the best approach if I have about 1000 images in queue to be compared against the millions of already exiting images.
I need to know the followings.
Thanks in advance.
I think some part of this question is discussed on the pHash support forum.
You will need to use the mvptree storage mechanism
http://lists.phash.org/htdig.cgi/phash-support-phash.org/2011-May/000122.html and http://lists.phash.org/htdig.cgi/phash-support-phash.org/2010-October/000103.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With