The problem sounds pretty common but somehow I can't find something which helps me with that...
I feel like I'm lacking fundamental hashing and encryption knowledge.
Problem
Suppose that I have a phone number which is (hopefully unique and) used as ID.
But I don't want to use my private number as ID in public interfaces.
What I need is a solution which obfuscates a string one way but still keeping the uniqueness so when someone else uses the algorithm he will get the same ID.
Solution (?)
Is there a hash algorithm which guarantees uniqueness when the input does not exceed the hashed output length but remains still (almost) impossible to reverse.
What about using a fixed public key RSA encryption? The output should be unique but the attacker would have to break a single key decrypt all numbers. Sounds like a bad idea...
Update (based on answer)
Obviously I'm searching for a cryptographic hash algorithm with a low collision probability.
Now (that I have caught some sleep and) thought that through there are some more facts I can think of:
With that being said: I can decide on using a hash. This way no one can tell immediately (without attacking it) which phone number is used. That seems to be the whole point.
What you basically want is a Hashing Algorithm (as your question states). But where it gets tricky is the two lines:
Depending on the input length you can prove uniqueness (or non collision) yourself with a few for loops and some time. So for your phone number example, you could easily prove out that all phone numbers for a SHA1 has do not collide.
If your input space is large you can take comfort in that modern hash function (like SHA-1 or SHA-3) have a very low probability of colliding (birthday problem), but there are no guarantees. Although people have been trying for a long time to find a collision for SHA-1 and have found them, I think that current cost to break a single SHA1 has was 2 million in a project called HashClash. Currently, the recommendation is for people to move on to SHA-3, where there has been no collisions detected. (collisions for SHA-1 I think required something like 2^51 operations to find so it may be good enough for your needs).
For the second part of your question, "remaining impossible to reverse". You can strive for making something computationally infeasible. But with infinite time an attacker can reverse any hash.
This link is an excellent examination of non cryptographic current hash algorithms. Unfortunately you probably cannot use any of these mentioned in the article because you need to be resistant to reversing, so you do not want a fast hashing algorithm. Slower algorithms make things more computationally infeasible.
Lets assume for a second that the attacker knows that the 160 bit SHA1 hash (or whatever hash you are using) is a phone number. In that case it would not be hard for him to create a rainbow table for every possible phone numbers hash value. This is true for any hash algorithm. What people normally do to avoid this is to Salt the original phrase. This helps to make building a rainbow table infeasible because the Salt is secret and number of possibilities are huge.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With