Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How unique are the first 8-12 characters of SHA256 hashes?

Take this hash for example:

ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad

It's too long for my purposes so I intend to use a small chunk from it, such as:

ba7816bf8f01
ba7816bf

Or similar. My intended use case:

  • Video gallery on a website, represented by thumbnails. They are in random order.
  • They play in the lightbox. They don't have a unique ID, only their URL is unique.
  • While the lightbox is open I add something to the end of the page URL with JS History API.

//example.com/video-gallery/lightbox/ba7816bf8f01

  • The suffix needs to be short and simple, definitely not a URL.
  • People share the URL.
  • The server can make sense of the lightbox/ba7816bf8f01 in relation to /video-gallery.
  • Visiting the URL, the lightbox needs to find which video the suffix belongs to and play it.

I thought I'd SHA256 the URL of the video, use the first few characters as an ad-hoc ID. How many characters should I use from the generated hash, to considerably reduce the chance of collision?

I got the idea from URLs and Hashing by Google.

like image 502
Firsh - justifiedgrid.com Avatar asked Mar 11 '18 19:03

Firsh - justifiedgrid.com


1 Answers

The Wikipedia page on birthday attacks has a table with the number of entries you need to produce a certain chance of collision with a certain number of bits as a random identifier. If you want to have a one in a million chance of a collision and expect to store a million documents, for example, you’ll need fewer than 64 bits (16 hex characters).

Base64 is a good way to fit more bits into the same length of string compared to hex, too, taking 1⅓ characters per byte instead of 2.

like image 55
Ry- Avatar answered Oct 03 '22 02:10

Ry-