I have a directory of images, photos, web graphics, logos, etc... these are all pulled from the web. There are .jpg, .gif, and .png files.
I would like to extract images that are of real things (keep photos and remove graphics). I'm not trying to get actual / original photographs, just images of real life stuff versus computer made graphics (I'm not sure how to say this more clearly). Almost all of these images have been manipulated and exif information will not be available.
A large (even very large) margin of error is acceptable.
I've already:
imagecolorstotal()
I'm thinking about removing images with histogram values concentrated around certain colors, rather than a smooth or distributed curve. I have not attempted this yet.
How else can I improve this filtering of images to extract (mostly) real photos? I'd prefer to use PHP but that is not required.
UPDATE: It turns out that for my application, the first three things I had already tried was a solid 80% solution. Further filtering can be done using some of the answers below.
The function exif_read_data can provide information about cameras used, it differs greatly for each camera. This won't be the perfect solution but it should add to what you are already using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With