I've developed a simple and fast algorithm in PHP to compare images for similarity. Its fast (~40 per second for 800x600 images) to hash and a unoptimised search algorithm can go through 3,000 images in 22 mins comparing each one against the others (3/sec). The basic overview is you get a image, rescale it to 8x8 and then convert those pixels for HSV. The Hue, Saturation and Value are then truncated to 4 bits and it becomes one big hex string. Comparing images basically walks along two strings, and then adds the differences it finds. If the total number is below 64 then its the same image. Different images are usually around 600 - 800. Below 20 and extremely similar. Are there any improvements upon this model I can use? I havent looked at how relevant the different components (hue, saturation and value) are to the comparison. Hue is probably quite important but the others? To speed up searches I could probably split the 4 bits from each part in half, and put the most significant bits first so if they fail the check then the lsb doesnt need to be checked at all. I dont know a efficient way to store bits like that yet still allow them to be searched and compared easily. I've been using a dataset of 3,000 photos (mostly unique) and there havent been any false positives. Its completely immune to resizes and fairly resistant to brightness and contrast changes.

What you want to use is: <ol> <li>Feature extraction</li> <li>Hashing</li> <li>Locally aware bloom hashing.</li> </ol> <hr> <ol> <li>Most people use SIFT features, although I've had better experiences with not scale-invariant ones. Basically you use an edge detector to find interesting points and then center your image patches around those points. That way you can also detect sub-images.</li> <li>What you implemented is a hash method. There's tons to try from, but yours should work fine :)</li> <li>The crucial step to making it fast is to hash your hashes. You convert your values into unary representation and then take a random subset of the bits as the new hash. Do that with 20-50 random samples and you get 20-50 hash tables. If any feature matches 2 or more out of those 50 hash tables, the feature will be very similar to one you already stored. This allows you to convert the abs(x-y)</li> </ol> Hope it helps, if you'd like to try out my self-developed image similarity search, drop me a mail at hajo at spratpix

Good way to identify similar images? [closed]

Tags:

php

image

computer-vision

gd

content-based-retrieval

I've developed a simple and fast algorithm in PHP to compare images for similarity.

Its fast (~40 per second for 800x600 images) to hash and a unoptimised search algorithm can go through 3,000 images in 22 mins comparing each one against the others (3/sec).

The basic overview is you get a image, rescale it to 8x8 and then convert those pixels for HSV. The Hue, Saturation and Value are then truncated to 4 bits and it becomes one big hex string.

Comparing images basically walks along two strings, and then adds the differences it finds. If the total number is below 64 then its the same image. Different images are usually around 600 - 800. Below 20 and extremely similar.

Are there any improvements upon this model I can use? I havent looked at how relevant the different components (hue, saturation and value) are to the comparison. Hue is probably quite important but the others?

To speed up searches I could probably split the 4 bits from each part in half, and put the most significant bits first so if they fail the check then the lsb doesnt need to be checked at all. I dont know a efficient way to store bits like that yet still allow them to be searched and compared easily.

I've been using a dataset of 3,000 photos (mostly unique) and there havent been any false positives. Its completely immune to resizes and fairly resistant to brightness and contrast changes.

253

asked May 15 '10 03:05

Nick

1 Answers

What you want to use is:

Feature extraction
Hashing
Locally aware bloom hashing.

Most people use SIFT features, although I've had better experiences with not scale-invariant ones. Basically you use an edge detector to find interesting points and then center your image patches around those points. That way you can also detect sub-images.
What you implemented is a hash method. There's tons to try from, but yours should work fine :)
The crucial step to making it fast is to hash your hashes. You convert your values into unary representation and then take a random subset of the bits as the new hash. Do that with 20-50 random samples and you get 20-50 hash tables. If any feature matches 2 or more out of those 50 hash tables, the feature will be very similar to one you already stored. This allows you to convert the abs(x-y)

Hope it helps, if you'd like to try out my self-developed image similarity search, drop me a mail at hajo at spratpix

155

answered Nov 02 '22 03:11

fxtentacle

Related questions
                            
                                How to successfully rewrite old mysql-php code with deprecated mysql_* functions?
                            
                                How does apache PHP memory usage really work? [closed]
                            
                                Imagick: compose with mask
                            
                                Send HTML email including CSS style sheet via PHP
                            
                                Integrating/ Migrating two CodeIgniter applications.
                            
                                Create an A* search with PHP
                            
                                How to read non-ASCII characters from CLI standard input
                            
                                get_terms gives "invalid taxonomy" from plugin
                            
                                Laravel 5 Multi-Tenancy App with separate databases - users have access to multiple installations
                            
                                PHP autoload and static variable in function
                            
                                CodeIgniter AJAX file upload, $_FILE is empty when upload
                            
                                My site is infected with obfuscated PHP malware - what is it doing + how do I get rid of it?
                            
                                Locking a SQL Server Database with PHP
                            
                                how to call a PHTML file within a CMS page Magento
                            
                                Error 503 Backend fetch failed
                            
                                Mysql hamming distance of hexadecimal values
                            
                                Can't establish a connection to the server at ws://localhost:8000/socket/server/startDaemon.php. var socket = new WebSocket(host);
                            
                                Compare one query with multiple results in PHP
                            
                                Mocking PDO with phpunit
                            
                                How to protect jquery button with Invisible reCaptcha?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With