Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting non-numeric String to Integer?

How can I convert a non-numeric String to an Integer?

I got for instance:

String unique = "FUBAR";

What's a good way to represent the String as an Integer with no collisions e.g. "FUBAR" should always be represented as the same number and shan't collide with any other String. For instance, String a = "A"; should be represented as the Integer 1 and so on, but what is a method that does this (preferrably for all unicode strings, but in my case ASCII values could be sufficient).

like image 827
Niklas Rosencrantz Avatar asked Nov 01 '13 10:11

Niklas Rosencrantz


3 Answers

This is impossible. Think about it, an Integer can only be 32 bits. So, by the pigeonhole principle, there must exist at least two strings that have the same Integer value no matter what technique you use for conversion. In reality, there are infinite with the same values...

If you're just looking for an efficient mapping, then I suggest that you just use the int returned by hashCode(), which for reference is actually 31 bits.

like image 112
Steve P. Avatar answered Sep 20 '22 05:09

Steve P.


You can map Strings to unique IDs using table. There is not way to do this generically.

final Map<String, Integer> map = new HashMap<>();
public int idFor(String s) {
    Integer id = map.get(s);
    if (id == null)
       map.put(s, id = map.size());
    return id;
}

Note: having unique id's doesn't guarantee no collisions in a hash collection.

http://vanillajava.blogspot.co.uk/2013/10/unique-hashcodes-is-not-enough-to-avoid.html

like image 40
Peter Lawrey Avatar answered Sep 21 '22 05:09

Peter Lawrey


If you know the character set used in your strings, then you can think of the string as number with base other than 10. For example, hexadecimal numbers contain letters from A to F.

Therefore, if you know that your strings only contain letters from an 8-bit character set, you can treat the string as a 256-base number. In pseudo code this would be:

number n;
for each letter in string
    n = 256 * n + (letter's position in character set)

If your character set contains 65535 characters, then just multiply 'n' with that number on each step. But beware, the 32 bits of an integer will be easily overflown. You probably need to use a type that can hold a larger number.

like image 31
Torben Avatar answered Sep 19 '22 05:09

Torben