Is it possible to get identical SHA1 hash? [duplicate]

Tags:

Given two different strings S1 and S2 (S1 != S2) is it possible that:

SHA1(S1) == SHA1(S2)

is True?

If yes - with what probability?
If not - why not?
Is there a upper bound on the length of a input string, for which the probability of getting duplicates is 0? OR is the calculation of SHA1 (hence probability of duplicates) independent of the length of the string?

The goal I am trying to achieve is to hash some sensitive ID string (possibly joined together with some other fields like parent ID), so that I can use the hash value as an ID instead (for example in the database).

Example:

Resource ID: X123 Parent ID: P123

I don't want to expose the nature of my resource identifies to allow client to see "X123-P123".

Instead I want to create a new column hash("X123-P123"), let's say it's AAAZZZ. Then the client can request resource with id AAAZZZ and not know about my internal id's etc.

261

asked Mar 19 '10 17:03

Andriy Drozdyuk

2 Answers

What you describe is called a collision. Collisions necessarily exist, since SHA-1 accepts many more distinct messages as input that it can produce distinct outputs (SHA-1 may eat any string of bits up to 2^64 bits, but outputs only 160 bits; thus, at least one output value must pop up several times). This observation is valid for any function with an output smaller than its input, regardless of whether the function is a "good" hash function or not.

Assuming that SHA-1 behaves like a "random oracle" (a conceptual object which basically returns random values, with the sole restriction that once it has returned output v on input m, it must always thereafter return v on input m), then the probability of collision, for any two distinct strings S1 and S2, should be 2^(-160). Still under the assumption of SHA-1 behaving like a random oracle, if you collect many input strings, then you shall begin to observe collisions after having collected about 2^80 such strings.

(That's 2^80 and not 2^160 because, with 2^80 strings you can make about 2^159 pairs of strings. This is often called the "birthday paradox" because it comes as a surprise to most people when applied to collisions on birthdays. See the Wikipedia page on the subject.)

Now we strongly suspect that SHA-1 does not really behave like a random oracle, because the birthday-paradox approach is the optimal collision searching algorithm for a random oracle. Yet there is a published attack which should find a collision in about 2^63 steps, hence 2^17 = 131072 times faster than the birthday-paradox algorithm. Such an attack should not be doable on a true random oracle. Mind you, this attack has not been actually completed, it remains theoretical (some people tried but apparently could not find enough CPU power)(Update: as of early 2017, somebody did compute a SHA-1 collision with the above-mentioned method, and it worked exactly as predicted). Yet, the theory looks sound and it really seems that SHA-1 is not a random oracle. Correspondingly, as for the probability of collision, well, all bets are off.

As for your third question: for a function with a n-bit output, then there necessarily are collisions if you can input more than 2^n distinct messages, i.e. if the maximum input message length is greater than n. With a bound m lower than n, the answer is not as easy. If the function behaves as a random oracle, then the probability of the existence of a collision lowers with m, and not linearly, rather with a steep cutoff around m=n/2. This is the same analysis than the birthday paradox. With SHA-1, this means that if m < 80 then chances are that there is no collision, while m > 80 makes the existence of at least one collision very probable (with m > 160 this becomes a certainty).

Note that there is a difference between "there exists a collision" and "you find a collision". Even when a collision must exist, you still have your 2^(-160) probability every time you try. What the previous paragraph means is that such a probability is rather meaningless if you cannot (conceptually) try 2^160 pairs of strings, e.g. because you restrict yourself to strings of less than 80 bits.

200

answered Oct 13 '22 04:10

Thomas Pornin

Yes it is possible because of the pigeon hole principle.

Most hashes (also sha1) have a fixed output length, while the input is of arbitrary size. So if you try long enough, you can find them.

However, cryptographic hash functions (like the sha-family, the md-family, etc) are designed to minimize such collisions. The best attack known takes 2^63 attempts to find a collision, so the chance is 2^(-63) which is 0 in practice.

answered Oct 13 '22 04:10

Henri

Related questions
                            
                                What is the most secure seed for random number generation?
                            
                                Example of AES using Crypto++ [closed]
                            
                                How do I enable perfect forward secrecy by default on Apache?
                            
                                How to Generate Unique Public and Private Key via RSA
                            
                                Storing passwords with Node.js and MongoDB
                            
                                Using SHA1 and RSA with java.security.Signature vs. MessageDigest and Cipher
                            
                                Did I understand /dev/urandom?
                            
                                Microsoft Crypto API Disable Use of RSAES-OAEP Key Transport Algorithm
                            
                                What are optimal scrypt work factors?
                            
                                How to install cryptography on ubuntu?
                            
                                How does a cryptographically secure random number generator work?
                            
                                NodeJS: bcrypt vs native crypto
                            
                                How to decrypt a SHA-256 encrypted string?
                            
                                What is the Difference between a Hash and MAC (Message Authentication code)?
                            
                                Generate SHA hash in C++ using OpenSSL library
                            
                                Login without HTTPS, how to secure?
                            
                                How do I get the public key of a pem file?
                            
                                How to encrypt data that needs to be decrypted in node.js?
                            
                                Why java.security.NoSuchProviderException No such provider: BC?
                            
                                How to get a MD5 hash from a string in Golang?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to get identical SHA1 hash? [duplicate]

Tags:

cryptography

hash

sha1

checksum

Andriy Drozdyuk

People also ask

2 Answers

Thomas Pornin

Henri

Recent Activity

Donate For Us