Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I distinguish between graphics and photographs?

I have a directory of images, photos, web graphics, logos, etc... these are all pulled from the web. There are .jpg, .gif, and .png files.

I would like to extract images that are of real things (keep photos and remove graphics). I'm not trying to get actual / original photographs, just images of real life stuff versus computer made graphics (I'm not sure how to say this more clearly). Almost all of these images have been manipulated and exif information will not be available.

A large (even very large) margin of error is acceptable.

I've already:

  • removed images with low color counts using imagecolorstotal()
  • removed images that have large height to width ratios, and vice versa (a ratio of 3+ works shockingly well).
  • removed images that are smaller than a certain dimension (50-75px is good)

I'm thinking about removing images with histogram values concentrated around certain colors, rather than a smooth or distributed curve. I have not attempted this yet.

How else can I improve this filtering of images to extract (mostly) real photos? I'd prefer to use PHP but that is not required.

UPDATE: It turns out that for my application, the first three things I had already tried was a solid 80% solution. Further filtering can be done using some of the answers below.

like image 267
T. Brian Jones Avatar asked Aug 09 '11 10:08

T. Brian Jones


1 Answers

The function exif_read_data can provide information about cameras used, it differs greatly for each camera. This won't be the perfect solution but it should add to what you are already using.

like image 91
Nick Maroulis Avatar answered Oct 17 '22 04:10

Nick Maroulis