Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

understanding of hash code

Tags:

java

hash

hash function is important in implementing hash table. I know that in java Object has its hash code, which might be generated from weak hash function.

Following is one snippet that is "supplement hash function"

static int hash(Object x) {
    int h = x.hashCode();

    h += ~(h << 9);
    h ^=  (h >>> 14);
    h +=  (h << 4);
    h ^=  (h >>> 10);
    return h;
}

Can anybody help to explain what is the fundamental idea of a hash algorithm ? to generate non-duplicate integer? If so, how does these bitwise operations make it?

like image 964
SecureFish Avatar asked Jun 25 '10 21:06

SecureFish


People also ask

What do you mean by hash code?

What Does Hash Code Mean? Hash code in . NET framework is a numeric value which helps in identification of an object during equality testing and also can serve as an index for the object. The value contained in the hash code is not permanent in nature.

What does a hash value tell you?

A hash value is a numeric value of a fixed length that uniquely identifies data. Hash values represent large amounts of data as much smaller numeric values, so they are used with digital signatures. You can sign a hash value more efficiently than signing the larger value.

How do you read a hash value?

A hash function is a mathematical function that converts an input value into a compressed numerical value – a hash or hash value. Basically, it's a processing unit that takes in data of arbitrary length and gives you the output of a fixed length – the hash value.

Is it possible to decode a hash code?

No, you cannot decode hashes. A hash is a one-way (almost unique) representation of a piece of data. You could use rainbow tables, brute-force or dictionary attacks on the hashes to recover the unencrypted password.


3 Answers

A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array. The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes. (wikipedia)

Using more "human" language object hash is a short and compact value based on object's properties. That is if you have two objects that vary somehow - you can expect their hash values to be different. Good hash algorithm produces different values for different objects.

like image 153
Vadym Stetsiak Avatar answered Sep 29 '22 23:09

Vadym Stetsiak


What you are usually trying to do with a hash algorithm is convert a large search key into a small nonnegative number, so you can look up an associated record in a table somewhere, and do it more quickly than M log2 N (where M is the cost of a "comparison" and N is the number of items in the "table") typical of a binary search (or tree search).

If you are lucky enough to have a perfect hash, you know that any element of your (known!) key set will be hashed to a unique, different value. Perfect hashes are primarily of interest for things like compilers that need to look up language keywords.

In the real world, you have imperfect hashes, where several keys all hash to the same value. That's OK: you now only have to compare the key to a small set of candidate matches (the ones that hash to that value), rather than a large set (the full table). The small sets are traditionally called "buckets". You use the hash algorithm to select a bucket, then you use some other searchable data structure for the buckets themselves. (If the number of elements in a bucket is known, or safely expected, to be really small, linear search is not unreasonable. Binary search trees are also reasonable.)

The bitwise operations in your example look a lot like a signature analysis shift register, that try to compress a long unique pattern of bits into a short, still-unique pattern.

like image 44
John R. Strohm Avatar answered Sep 30 '22 01:09

John R. Strohm


Basically, the thing you're trying to achieve with a hash function is to give all bits in the hash code a roughly 50% chance of being off or on given a particular item to be hashed. That way, it doesn't matter how many "buckets" your hash table has (or put another way, how many of the bottom bits you take in order to determine the bucket number)-- if every bit is as random as possible, then an item will always be assigned to an essentially random bucket.

Now, in real life, many people use hash functions that aren't that good. They have some randomness in some of the bits, but not all of them. For example, imagine if you have a hash function whose bits 6-7 are biased-- let's say in the typical hash code of an object, they have a 75% chance of being set. In this made up example, if our hash table has 256 buckets (i.e. the bucket number comes from bits 0-7 of the hash code), then we're throwing away the randomness that does exist in bits 8-31, and a smaller portion of the buckets will tend to get filled (i.e. those whose numbers have bits 6 and 7 set).

The supplementary hash function basically tries to spread whatever randomness there is in the hash codes over a larger number of bits. So in our hypothetical example, the idea would be that some of the randomness from bits 8-31 will get mixed in with the lower bits, and dilute the bias of bits 6-7. It still won't be perfect, but better than before.

like image 30
Neil Coffey Avatar answered Sep 30 '22 00:09

Neil Coffey