Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two-way "Hashing" of string

Tags:

c++

hash

I want to generate int from a string and be able to generate it back. Something like hash function but two-way function. I want to use ints as ID in my application, but want to be able to convert it back in case of logging or debugging.

Like:

int id = IDProvider::getHash("NameOfMyObject");

object * a = createObject(id);

...

if(error)
{
    LOG(IDProvider::getOriginalString(a->getId()), "some message");
}

I have heard of slightly modified CRC32 to be fast and 100% reversible, but I can not find it and I am not able to write it by myself.

Any hints what should I use? Thank you!

edit I have just founded the source I have the whole CRC32 thing from:

Jason Gregory : Game Engine Architecture

quotation:

"As with any hashing system, collisions are a possibility (i.e., two different strings might end up with the same hash code). However, with a suitable hash function, we can all but guarantee that collisions will not occur for all reasonable input strings we might use in our game. After all, a 32-bit hash chode represents more than four billion possible values. So if our hash function does a good job of distributing strings evently throughout this very large range, we are unlikely to collide. At Naughty Dog, we used a variant of the CRC-32 algorithm to hash our strings, and we didn't encounter a single collision in over two years of development on Uncharted: Drake's Fortune."

like image 473
relaxxx Avatar asked Oct 30 '11 18:10

relaxxx


4 Answers

Reducing an arbitrary length string to a fixed size int is mathematically impossible to reverse. See Pidgeonhole principle. There is a near infinite amount of strings, but only 2^32 32 bit integers.

32 bit hashes(assuming your int is 32 bit) can have collisions very easily. So it's not a good unique ID either.

There are hashfunctions which allow you to create a message with a predefined hash, but it most likely won't be the original message. This is called a pre-image.

For your problem it looks like the best idea is creating a dictionary that maps integer-ids to strings and back.


To get the likelyhood of a collision when you hash n strings check out the birthday paradox. The most important property in that context is that collisions become likely once the number of hashed messages approaches the squareroot of the number of available hash values. So with a 32 bit integer collisions become likely if you hash around 65000 strings. But if you're unlucky it can happen much earlier.

like image 160
CodesInChaos Avatar answered Nov 14 '22 02:11

CodesInChaos


I have exactly what you need. It is called a "pointer". In this system, the "pointer" is always unique, and can always be used to recover the string. It can "point" to any string of any length. As a bonus, it also has the same size as your int. You can obtain a "pointer" to a string by using the & operand, as shown in my example code:

#include <string>
int main() {
    std::string s = "Hai!";
    std::string* ptr = &s; // this is a pointer
    std::string copy = *ptr; // this retrieves the original string
    std::cout << copy; // prints "Hai!"
}
like image 43
Puppy Avatar answered Nov 14 '22 04:11

Puppy


What you need is encryption. Hashing is by definition one way. You might try simple XOR Encryption with some addition/subtraction of values.

  • Reversible hash function?
  • How come MD5 hash values are not reversible?
  • checksum/hash function with reversible property
  • http://groups.google.com/group/sci.crypt.research/browse_thread/thread/ffca2f5ac3093255

... and many more via google search...

like image 4
HashIsGood Avatar answered Nov 14 '22 04:11

HashIsGood


You could look at perfect hashing

http://en.wikipedia.org/wiki/Perfect_hash_function

It only works when all the potential strings are known up front. In practice what you enable by this, is to create a limited-range 'hash' mapping that you can reverse-lookup.

In general, the [hash code + hash algorithm] are never enough to get the original value back. However, with a perfect hash, collisions are by definition ruled out, so if the source domain (list of values) is known, you can get the source value back.

gperf is a well-known, age old program to generate perfect hashes in c/c++ code. Many more do exist (see the Wikipedia page)

like image 3
sehe Avatar answered Nov 14 '22 02:11

sehe