I want to get a unique numeric representation of a String. I know there are lots of ways of doing this, my question is which do you think is the best? I don't want to have negative numbers - so the hashcode() function in java is not so good, although I could override it ... but I'd rather not since I am not so confident and don't want to accidentally break something.
My Strings are all semantic-web URIS. The reason for the numeric representation is that when I display the data for a URI on a page I need something to pass into the query String or put into various fields in my javascript. The URI itself is too unwieldy and looks bad when you have a URI as a value in a URI.
Basically I want to have a class called Resource
which will look like this
Resource{
int id;
String uri;
String value; // this is the label or human readable name
// .... other code/getters/setters here
public int getId(){
return id = stringToIntFunction();
}
private int stringToIntFunction(String uri){
// do magic here
}
}
Can you suggestion a function that would do this if:
Also are there other issues that are important that I am not considering?
If you want it to be reversible, you're in trouble. Hashes are designed to be one-way.
In particular, given that an int
has 32 bits of information, and a char
has 16 bits of information, requiring reversibility means you can only have strings of zero, one or two characters (and even that's assuming that you're happy to encode "" as "\0\0" or something similar). That's assuming you don't have any storage, of course. If you can use storage, then just store numbers sequentially... something like:
private int stringToIntFunction(String uri) {
Integer existingId = storage.get(uri);
if (existingId != null) {
return existingId.intValue();
}
return storage.put(uri);
}
Here storage.put()
would increase a counter internally, store the URI as being associated with that counter value, and return it. My guess is that that's not what you're after though.
Basically, to perform a reversible encryption, I'd use a standard encryption library having converted the string to a binary format first (e.g. using UTF-8). I would expect the result to be a byte[]
.
If it doesn't have to be reversible, I'd consider just taking the absolute value of the normal hashCode()
result (but mapping Integer.MIN_VALUE
to something specific, as its absolute value can't be represented as an int
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With