Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Chance of a duplicate hash when using first 8 characters of SHA1

If I have an index of URLs, and ID them by the first 8 characters of a SHA1 hash, what is the probability of two different URLs having identical IDs?

like image 534
zino Avatar asked May 31 '15 18:05

zino


People also ask

How likely are SHA1 collisions?

It should take 2^160 operations to find a collision with SHA1, however using the Birthday Paradox, we can have a probability of 50% of finding a SHA1 collision in about 2^80 operations.

Is SHA1 always unique?

A hash function such as SHA-1 is used to calculate an alphanumeric string that serves as the cryptographic representation of a file or a piece of data. This is called a digest and can serve as a digital signature. It is supposed to be unique and non-reversible.

Can two files have the same SHA-1 hash?

A: An SHA-1 hash value is a 40-character string that identifies the contents of a file. If two files have the same contents then it's probable they will have the same SHA-1 hash value. However, please note that it is possible to create two completely different files that have the same SHA-1 hash value.

Why is SHA1 not secure?

What Causes the Insecure SSL error? While SSL certificates are currently secure, Google considers the SHA-1 hash algorithm insecure after 2016. This is due to reports from some security companies, that online attackers could feasibly compromise SSL certificates keyed with SHA-1 hash.


2 Answers

@Teepeemm has correctly answered the related question ‘given a particular sequence of 8 hex digits, what is the chance of another SHA-1 hash appearing with the same 8 digits?’ It's a very small number.

What's at stake in this question, though, is a different question: ‘given a large number of 8-hex-digit sequences, what's the chance of any two of them being the same?’ As the first comment to the question points out, this is related to the birthday paradox, which is not ‘what's the chance of someone in the room having the same birthday as me?’, but instead ‘what's the chance of any two people in this room having the same birthday?’ As is reasonably well known, the chance of that is 50% with only 23 people.

The hash-collision problem is essentially the same problem, but generalised from N=365 days to N=16^8 8-byte sequences, which is about 4.30e9. That's the ‘generalised birthday problem’. Using the expression quoted there (n=sqrt(2*d*ln(1/(1-p))), with d=4.30e9 and p=0.5, we find a 50% chance of a collision with only 77000 trials. If you plot the corresponding function, you see the probability increases quite rapidly as the number of trials increases.

Even with 16 bytes of hash (so d=16^16) there's a 50% chance of a collision after only 5 billion trials.

Happy birthday!

like image 198
Norman Gray Avatar answered Nov 03 '22 20:11

Norman Gray


A SHA-1 hash has 40 base-16 digits. If you’re only looking at the first 8 of them, then the chance that a second url has the same 8 digits is (1/16)^8 ~ 2.32e-10. Actually, this doesn't depend on there being 40 digits to begin with, or even that it's SHA-1. The only assumption you need is that SHA-1 has the first 8 digits independent and identically distributed.

like image 28
Teepeemm Avatar answered Nov 03 '22 20:11

Teepeemm