Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A cheap/fast way to hash Bitmaps?

I have an application that takes a gallery of pictures (all in Jpeg) and give similarity scores between each possible pairs. At every point in time, only one pair can be selected and its similarity score is displayed.

The algorithm that compare the two images has a certain performance cost, such that it takes a few seconds to compare a pair.

When two pictures are selected:

  1. If the pair has never been compared, the score shows "Not scored yet.". The user can click the "Score" button and the pair will be sent to a thread that queues scores to be computed. Example: http://db.tt/gb1Yk6yx
  2. If the pair is currently in the queue to be computed, the score field shows "Computing...". Example: http://db.tt/OvS1qGP3
  3. If the pair has been compared, the score attached to the pair is shown. Example: http://db.tt/m2OQGybW

Example (when doing a batch): http://db.tt/iD67SdCp

If a score has never been computed, and a user click "Score", the field will switch to "Computing..." then will display the score when the computation is completed.

Before displaying anything in the field of the score, when two pairs are selected, their attached Bitmap are sent to a HashMap that verify if those two Bitmaps already have an attached score, in which case it simply return it. If there's no score, then the job is sent in the queue.

To know if the score exists in the cache, I need to find a way to hash the pair so that I can use the resulting key to lookup the cache. That's where my problem is. To make sense, the hashing of the two Bitmap should be fast. Otherwise, I'm just adding another layer of computation. But, the way I do so far to hash the two Bitmap is to send them in a byte array and get their MD5 checksum. Like this:

private Long getHashKey(Bitmap first, Bitmap second){

    // TODO this IS costly, it render useless the cache optimization.
    // also, it doesn't detect that comp(A,B) is the same as comp(B,A).
    // much work to do here.

    if(D) Profiling.start(TAG, "getHashKey");

    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    first.compress(Bitmap.CompressFormat.JPEG, 100, stream);

    byte[] firstArray = stream.toByteArray();
    second.compress(Bitmap.CompressFormat.JPEG, 100, stream);

    byte[] secondArray = stream.toByteArray();
    byte[] bitmapBuffer = new byte[firstArray.length + secondArray.length];

    System.arraycopy(firstArray, 0, bitmapBuffer, 0, firstArray.length);

    System.arraycopy(secondArray, 0, bitmapBuffer, 
            firstArray.length, secondArray.length);

    Adler32 md5Hash = new Adler32();
    md5Hash.update(bitmapBuffer);
    long hashKey = md5Hash.getValue();

    if(D) Profiling.stop();

    return hashKey;
}

However, this method, according to the profiling I did, cost about 53 ms to run, which causes a lag in the UI that is quite unpleasant. TIn more detailed profiling, I found that roughly 95% of the computing time is done in the compress methods. However, I have not found another way to get the bytes backing the Bitmaps.

05-26 17:56:13.220: D/Profiling(9458): Profile for ImageCompareActivity.getHashKey:
05-26 17:56:13.220: D/Profiling(9458): >          Count : 1996 calls
05-26 17:56:13.220: D/Profiling(9458): >  Total runtime : 105765140 us
05-26 17:56:13.220: D/Profiling(9458): >    Avg runtime : 52988 us

I know my way to hash the Bitmap is quite brute. But I don't know much about hashing functions, and which parts of a Bitmap I could use to uniquely identify the files. I don't want to use the filename or something like that, as I want to send those Bitmaps in a database eventually.

[Update 1] I didn't know about Object.hashCode(). Now, I modified the method like this:

private Integer getHashKey(Bitmap first, Bitmap second){

    if(D) Profiling.start(TAG, "getHashKey");

    Integer hashKey = new Integer(
            1013 * (first.hashCode()) ^ 1009 * (second.hashCode()) ); 

    if(D) Profiling.stop();

    return hashKey;
}

Which runs in average for about 18 us.

like image 666
AntoineG Avatar asked May 26 '12 15:05

AntoineG


1 Answers

Here is a recent question about hashing. Adler is probably the fastest method built-in to the JRE. Have you considered pre-computing the hash and storing it with the image, or in a database?

like image 88
bmm6o Avatar answered Oct 22 '22 14:10

bmm6o