Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A two way String hash function

I want to get a unique numeric representation of a String. I know there are lots of ways of doing this, my question is which do you think is the best? I don't want to have negative numbers - so the hashcode() function in java is not so good, although I could override it ... but I'd rather not since I am not so confident and don't want to accidentally break something.

My Strings are all semantic-web URIS. The reason for the numeric representation is that when I display the data for a URI on a page I need something to pass into the query String or put into various fields in my javascript. The URI itself is too unwieldy and looks bad when you have a URI as a value in a URI.

Basically I want to have a class called Resource which will look like this

Resource{
  int id;
  String uri;
  String value; // this is the label or human readable name

  // .... other code/getters/setters here

  public int getId(){
    return id = stringToIntFunction();
  }

  private int stringToIntFunction(String uri){
  // do magic here
  }
}

Can you suggestion a function that would do this if:

  1. It had to be two way, that is you could also recover the original string from the numeric value
  2. It doesn't have to be two way

Also are there other issues that are important that I am not considering?

like image 765
Ankur Avatar asked Nov 28 '22 10:11

Ankur


1 Answers

If you want it to be reversible, you're in trouble. Hashes are designed to be one-way.

In particular, given that an int has 32 bits of information, and a char has 16 bits of information, requiring reversibility means you can only have strings of zero, one or two characters (and even that's assuming that you're happy to encode "" as "\0\0" or something similar). That's assuming you don't have any storage, of course. If you can use storage, then just store numbers sequentially... something like:

private int stringToIntFunction(String uri) {
    Integer existingId = storage.get(uri);
    if (existingId != null) {
        return existingId.intValue();
    }
    return storage.put(uri);
}

Here storage.put() would increase a counter internally, store the URI as being associated with that counter value, and return it. My guess is that that's not what you're after though.

Basically, to perform a reversible encryption, I'd use a standard encryption library having converted the string to a binary format first (e.g. using UTF-8). I would expect the result to be a byte[].

If it doesn't have to be reversible, I'd consider just taking the absolute value of the normal hashCode() result (but mapping Integer.MIN_VALUE to something specific, as its absolute value can't be represented as an int).

like image 110
Jon Skeet Avatar answered Dec 05 '22 17:12

Jon Skeet