I'm trying to calculate the similarity (read: Levenshtein distance) of two images, using Python 2.6 and PIL.
I plan to us e the python-levenshtein library for fast comparison.
Main question:
What is a good strategy for comparing images? My idea is something like:
Of course, this will not handle cases like mirrored images, cropped images, etc. But for basic comparison, this should be useful.
Is there a better strategy documented somewhere?
EDIT: Aaron H is right about the speed issue. Calculating Levelshtein takes about forever for images bigger then a few hundred by a few hundred pixels. However, the difference between the results after downscaling to 100x100 and 200x200 is less then 1% in my example, so it might be wise to set up a maximum image size of ~100px or so...
EDIT: Thanks PreludeAndFugue, that question is what I was looking for.
By the way, Levenshtein distance can be optimized it seems, but it is giving me some really bad results, perhaps because of there's lots of redundant elements in the backgrounds. Got to look at some other algorithms.
EIDT: Root mean square deviation and Peak signal-to-noise ration seem to be another two options that are not very hard to implement and are seemingly not very CPU-expensive. However, it seems I'm going to need some kind of a context analysis for recognizing shapes, etc.
Anyway, thanks for all the links, and also for pointing out the direction towards NumPy/SciPy.
Finding the Difference between two images using PIL library To find the difference, upload 2 images in the interpreter and then using ImageChops find the difference between both of them, output will be self-explanatory.
In general, we can accomplish this in two ways. The first method is to use locality sensitive hashing, which I'll cover in a later blog post. The second method is to use algorithms such as Mean Squared Error (MSE) or the Structural Similarity Index (SSIM).
The similarity of the two images is detected using the package “imagehash”. If two images are identical or almost identical, the imagehash difference will be 0. Two images are more similar if the imagehash difference is closer to 0.
Pillow was announced as a replacement for PIL for future usage. Pillow supports a large number of image file formats including BMP, PNG, JPEG, and TIFF.
Check out imgSeek:
imgSeek is a collection of free open source visual similarity projects. The query (image you are looking for) can be expressed either as a rough sketch painted by the user or as another image you supply (or an image in your collection). The searching algorithm makes use of multiresolution wavelet decomposition of the query and database images.
You can take a look at the stsci library, it is made for comparing and analysing images. It should give you what you want but might be a little overkill.
If ou want to keep it simple you could reduce the amount of colors and the resolution first and then calculate the distance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With