Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare images and determine which has more content?

Goal: I want to grab the best frame from an animated GIF and use it as a static preview image. I believe the best frame is one that shows the most content - not necessarily the first or last frame.

Take this GIF for example:

enter image description here

--

This is the first frame:

enter image description here

--

Here is the 28th frame:

enter image description here

It's clear that frame 28th represents the entire GIF well.

How could I programmatically determine if one frame has more pixel/content over another? Any thoughts, ideas, packages/modules, or articles that you can point me to would be greatly appreciated.

like image 537
floatleft Avatar asked Aug 08 '17 23:08

floatleft


2 Answers

One straightforward way this could be accomplished would be to estimate the entropy of each image and choose the frame with maximal entropy.

In information theory, entropy can be thought of as the "randomness" of the image. An image of a single color is very predictable, the flatter the distribution, the more random. This is highly related to the compression method described by Arthur-R as entropy is the lower bound on how much data can be losslessly compressed.

Estimating Entropy

One way to estimate the entropy is to approximate the probability mass function for pixel intensities using a histogram. To generate the plot below I first convert the image to grayscale, then compute the histogram using a bin spacing of 1 (for pixel values from 0 to 255). Then, normalize the histogram so that the bins sum to 1. This normalized histogram is an approximation of the pixel probability mass function.

Using this probability mass function we can easily estimate the entropy of the grayscale image which is described by the following equation

H = E[-log(p(x))]

Where H is entropy, E is the expected value, and p(x) is the probability that any given pixel takes the value x.

Programmatically H can be estimated by simply computing -p(x)*log(p(x)) for each value p(x) in the histogram and then adding them together.

Plot of entropy vs. frame number for your example.

enter image description here

with frame 21 (the 22nd frame) having the highest entropy.

enter image description here

Observations

  • The entropy computed here is not equal to the true entropy of the image because it makes the assumption that each pixel is independently sampled from the same distribution. To get the true entropy we would need to know the joint distribution of the image which we won't be able to know without understanding the underlying random process that generated the images (which would include human interaction). However, I don't think the true entropy would be very useful and this measure should give a reasonable estimate of how much content is in the image.

  • This method will fail if some not-so-interesting frame contains much more noise (randomly colored pixels) than the most interesting frame because noise results in a high entropy. For example, the following image is pure uniform noise and therefore has maximum entropy (H = 8 bits), i.e. no compression is possible.

enter image description here

Ruby Implementation

I don't know ruby but it looks like one of the answers to this question refers to a package for computing entropy of an image.

From m. simon borg's comment

FWIW, using Ruby's File.size() returns 1904 bytes for the 28th frame image and 946 bytes for the first frame image – m. simon borg

File.size() should be roughly proportional to entropy.


As an aside, if you check the size of the 200x200 noise image on disk you will see that the file is 40,345 bytes even after compression, but the uncompressed data is only 40,000 bytes. Information theory tells us that no compression scheme can ever losslessly compress such images on average.

like image 144
jodag Avatar answered Sep 18 '22 01:09

jodag


There are a couple ways I might go about this. My first thought (this may not be the most practical solution, but it seems theoretically interesting!) would be to try losslessly compressing each frame, and in theory, the frame with the least repeatable content (and thus the most unique content) would have the largest size, so you could then compare the size in bytes/bits of each compressed frame. The accuracy of this solution would probably be highly dependent on the photo passed in.

A more realistic/ practical solution might be to grab the predominant color in the GIF (so in the example, the background color), and then iterate through each pixel and increment a counter each time the color of the current pixel doesn't match the color of the background.

I'm thinking about some more optimized/ sample based solutions, and will edit my response to include them a little later, if performance is a concern for you.

like image 44
Arthur-1 Avatar answered Sep 21 '22 01:09

Arthur-1