Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Any caveats to generating unique filenames for random images by running MD5 over the image contents?

I want to generate unique filenames per image so I'm using MD5 to make filenames.Since two of the same image could come from different locations, I'd like to actually base the hash on the image contents. What caveats does this present?

(doing this with PHP5 for what it's worth)

like image 346
Ben Throop Avatar asked Oct 15 '08 19:10

Ben Throop


2 Answers

It's a good approach. There is an extremely small possibility that two different images might hash to the same value, but in reality your data center has a greater probability of suffering a direct hit by an asteroid.

One caveat is that you should be careful when deleting images. If you delete an image record that points to some file and you delete the file too, then you may be deleting a file that has a different record pointing to the same image (that belongs to a different user, say).

like image 134
Greg Hewgill Avatar answered Sep 22 '22 12:09

Greg Hewgill


Given completely random file contents and a good cryptographic hash, the probability that there will be two files with the same hash value reaches 50% when the number of files is roughly 2 to (number of bits in the hash function / 2). That is, for a 128 bit hash there will be a 50% chance of at least one collision when the number of files reaches 2^64.

Your file contents are decidedly not random, but I have no idea how strongly that influences the probability of collision. This is called the birthday attack, if you want to google for more.

It is a probabilistic game. If the number of images will be substantially less than 2^64, you're probably fine. If you're still concerned, using a combination of SHA-1 plus MD5 (as another answer suggested) gets you to a total of 288 high-quality hash bits, which means you'll have a 50% chance of a collision once there are 2^144 files. 2^144 is a mighty big number. Mighty big. One might even say huge.

like image 36
DGentry Avatar answered Sep 19 '22 12:09

DGentry