Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying 2 same images using Java

Tags:

java

image

I have a problem in my web crawler where I am trying to retrieve images from a particular website. Problem is that often I see images that are exactly same but different in URL i.e. their address.

Is there any Java library or utility that can identify if 2 images are exactly same in their content (i.e. at pixel level).

My input will be URLs for the images where I can download them.

like image 817
shuby_rocks Avatar asked Mar 26 '09 06:03

shuby_rocks


People also ask

Can you use images in Java?

The java. awt. image. BufferedImage class, which extends the Image class to allow the application to operate directly with image data (for example, retrieving or setting up the pixel color).


2 Answers

I've done something very similar to this before in Java and I found that the PixelGrabber class inside the java.awt.image package of the api is extremely helpful (if not downright necessary).

Additionally you would definitely want to check out the ColorConvertOp class which can performs a pixel-by-pixel color conversion of the data in the source image and the resulting color values are scaled to the precision of the destination image. The documentation goes on to say that the images can even be the same image in which case it would be quite simple to detect if they are identical.

If you were detecting similarity, you need to use some form of averaging method as mentioned in the answer to this question

If you can, also check out Volume 2 chapter 7 of Horstman's Core Java (8th ed) because there's a whole bunch of examples on image transformations and the like, but again, make sure to poke around the java.awt.image package because you should find you have almost everything prepared for you :)

G'luck!

like image 104
HipsterZipster Avatar answered Oct 04 '22 10:10

HipsterZipster


Depending on how detailed you want to get with it:

  • download the image
  • as you download it generate a hash for it
  • make a directory where the directory name is the hash value (if the directory does not exist)
  • if directory contains 2 or more files then compare the file sizes
  • if the file sizes are the same then do a byte by byte comparison of the image to the bytes of the images in the file
  • if the bytes are unique then you have a new image

Regardless of if you want to do all that or not you need to:

  • download the images
  • do a byte-by-byte comparison of the images

No need to rely on any special imaging libraries, images are just bytes.

like image 28
TofuBeer Avatar answered Oct 04 '22 11:10

TofuBeer