I have a problem in my web crawler where I am trying to retrieve images from a particular website. Problem is that often I see images that are exactly same but different in URL i.e. their address.
Is there any Java library or utility that can identify if 2 images are exactly same in their content (i.e. at pixel level).
My input will be URLs for the images where I can download them.
The java. awt. image. BufferedImage class, which extends the Image class to allow the application to operate directly with image data (for example, retrieving or setting up the pixel color).
I've done something very similar to this before in Java and I found that the PixelGrabber class inside the java.awt.image package of the api is extremely helpful (if not downright necessary).
Additionally you would definitely want to check out the ColorConvertOp class which can performs a pixel-by-pixel color conversion of the data in the source image and the resulting color values are scaled to the precision of the destination image. The documentation goes on to say that the images can even be the same image in which case it would be quite simple to detect if they are identical.
If you were detecting similarity, you need to use some form of averaging method as mentioned in the answer to this question
If you can, also check out Volume 2 chapter 7 of Horstman's Core Java (8th ed) because there's a whole bunch of examples on image transformations and the like, but again, make sure to poke around the java.awt.image package because you should find you have almost everything prepared for you :)
G'luck!
Depending on how detailed you want to get with it:
Regardless of if you want to do all that or not you need to:
No need to rely on any special imaging libraries, images are just bytes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With