Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast and simple image hashing algorithm

I need a (preferably simple and fast) image hashing algorithm. The hash value is used in a lookup table, not for cryptography.

Some of the images are "computer graphic" - i.e. solid-color filled rects, rasterized texts and etc., whereas there are also "photographic" images - containing rich color spectrum, mostly smooth, with reasonable noise amplitude.

I'd also like the hashing algorithm to be able to be applied to specific image parts. I mean, the image can be divided into a grid cells, and the hash function of each cell should depend only on the contents of this cell. So that one may spot quickly if two images have common areas (in case they're aligned appropriately).

Note: I only need to know if two images (or their parts) are identical. That is, I don't need to match similar images, there's no need in feature recognition, correlation, and other DSP techniques.

I wonder what is the preferred hashing algorithm.

For "photographic" images just XOR-ing all the pixels within a grid cell is ok more-or-less. The probability of the same hash value for different images is pretty low, especially because the presence of the (nearly white) noise breaks all the potential symmetries. Plus the spectrum of such a hash function looks good (any value is possible with nearly the same probability).

But such a naive algorithm may not be used with "artificial" graphics. Identical pixels, repeating patterns, geometrical offset invariance are very common for such images. XOR-ing all the pixels will give 0 for any image with even number of identical pixels.

Using something like CRT-32 looks somewhat promising, but I'd like to figure-out something faster. I thought about iterative formula, each new pixel mutates the current hash value, like this:

hashValue = (hashValue * /*something*/ | newPixelValue) % /* huge prime */

Doing modulo prime number should probably give a good dispersion, so that I'm leaning toward this option. But I'd like to know if there are better varians.

Thanks in advance.

like image 721
valdo Avatar asked Jul 04 '12 23:07

valdo


People also ask

What is the fastest hashing algorithm?

SHA-1 is fastest hashing function with ~587.9 ms per 1M operations for short strings and 881.7 ms per 1M for longer strings. MD5 is 7.6% slower than SHA-1 for short strings and 1.3% for longer strings. SHA-256 is 15.5% slower than SHA-1 for short strings and 23.4% for longer strings.

What is simple hashing algorithm?

A hashing algorithm is a cryptographic hash function. It is a mathematical algorithm that maps data of arbitrary size to a hash of a fixed size. A hash function algorithm is designed to be a one-way function, infeasible to invert.

What is the best hashing algorithm?

Probably the one most commonly used is SHA-256, which the National Institute of Standards and Technology (NIST) recommends using instead of MD5 or SHA-1. The SHA-256 algorithm returns hash value of 256-bits, or 64 hexadecimal digits.


2 Answers

If you want to make it very fast, you should consider taking a random subset of the pixels to avoid reading the entire image. Next, compute a hash function on the sequence of values at those pixels. The random subset should be selected by a deterministic pseudo-random number generator with fixed seed so that identical images produce identical subsets and consequently identical hash values.

This should work reasonably well even for artificial images. However, if you have images which differ from each other by a small number of pixels, this is going to give hash collisions. More iterations give better reliability. If that is the case, for instance, if your images set is likely to have pairs with one different pixel, you must read every pixel to compute the hash value. Taking a simple linear combination with pseudo-random coefficients would be good enough even for artificial images.

pseudo-code of a simple algorithm

Random generator = new generator(2847)  // Initialized with fixed seed
int num_iterations = 100

int hash(Image image) {
    generator.reset()   //To ensure consistency on each evaluation
    int value = 0
    for num_iteration steps {
        int nextValue = image.getPixel(generator.nextInt()%image.getSize()).getValue()
        value = value + nextValue*generator.nextInt()
    }
    return value
}
like image 166
akashnil Avatar answered Oct 23 '22 10:10

akashnil


Have a look at this tutorial on the phash algorithm http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html which is used to find closely matching images.

like image 20
Micromega Avatar answered Oct 23 '22 12:10

Micromega