Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicate images of different sizes

I am wondering if there is a pre-existing algorithm/library/framework to compare two images to see if one is a re-sized version of the other? The programming language doesn't matter at this stage.

If there is nothing out there, I'd need to write something up. What I have thought of so far:

  • (Expensive) Resize the larger to the smaller and compare pixel by pixel.

  • Better yet, just resize a few random "areas" on the picture and compare. If they match, convert more, etc...

  • Break the image into a number of rows and columns and do some sort of parity math on the color values.

The problem I see with the first two ideas especially, is that there are different ways to re-size a picture in the first place, so the math will likely not work out the same at all. Some re-sizing adds blur, etc....

If anyone could point me to some good literature on this subject, that would be great. My googling turns up mostly shareware applications which is not what I want.

The goal is to have this running in the back of a webserver.

like image 798
VaporwareWolf Avatar asked Oct 26 '12 17:10

VaporwareWolf


People also ask

How can I find different sizes of an image?

If you search for an image on Google Image Search and then mouse over the result, Google will let you find “more sizes” of that image. For example, I searched for [cow] and hovered my mouse over the first result. As you can see, there is a link to “more sizes” for that image: When […]

Does Windows 10 have a duplicate photo finder?

Does Windows 10 have a built-in duplicate file finder app? No, Windows 10 doesn't have a built-in file finder. But, you can do this manually through the Windows photos app. You can also download duplicate file remover and run it.


2 Answers

The best approach depends on the characteristics of the images you are comparing, what percentage of probability it is that the images are the same, and when they are different, are they typically off by a lot or could it be as minute as a single pixel difference?

If the answers to the above is that the images you need to compare will be completely random then going with the expensive solution, or some available package might be the best bet.

If it is that you know that the images are different more often than not, and that the images typically differ quite a lot, and you really want to hand-roll a solution you could implement some initial 'quick compare' steps that would be less expensive and that would quickly identify a lot of the cases where the images are different.

For example you could resize the larger image, then either compare pixel-by-pixel (or calculate a hash of the pixel values) only a 'diagonal line' of the image (top left pixel to bottom right pixel) and by doing so exclude differing images and only do the more expensive comparison for those that pass this test.

Or take a pre-set number of points at whatever is a 'good distribution' depending on the type of image and only do the more expensive comparison for those that pass this test.

If you know a lot about the images you will be comparing, they have known characteristics and they are different more often than they are the same, implementing a cheap 'quick elimination compare' along the lines of the above could be worthwhile.

like image 161
user469104 Avatar answered Oct 06 '22 01:10

user469104


You need to look into dHash algorithm for this.

I wrote a pure java library just for this few days back. You can feed it with directory path(includes sub-directory), and it will list the duplicate images in list with absolute path which you want to delete. Alternatively, you can use it to find all unique images in a directory too.

It used awt api internally, so can't be used for Android though. Since, imageIO has problem reading alot of new types of images, i am using twelve monkeys jar which is internally used.

https://github.com/srch07/Duplicate-Image-Finder-API

Jar with dependencies bundled internally can be downloaded from, https://github.com/srch07/Duplicate-Image-Finder-API/blob/master/archives/duplicate_image_finder_1.0.jar

The api can find duplicates among images of different sizes too.

like image 45
Abhishek Anand Avatar answered Oct 06 '22 00:10

Abhishek Anand