Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a good idea to use a GUID in name of files generated by users?

Tags:

c#

.net

guid

I'm building a application (a CMS) where user can upload files like images.

My question is how to rename these files to save.

I think generate a GUID (System.GUID.NewGuid()) to save a file is the best way to go. I'm right or exist better approach in this case?

Note: An example of the GUID that is generated: 7c9e6679-944b-7425-40from-e07fc1f90ae7. In that case a image file will be: 7c9e6679-944b-7425-40from-e07fc1f90ae7.jpg

Update:

Users will not interact directly with the name of the file.

like image 635
Acaz Souza Avatar asked Oct 02 '11 18:10

Acaz Souza


People also ask

Do files have GUID?

A GUID can be used for people, cars, files, webpages, colors, anything. With regular registration numbers, you start counting at 1 and numbers can overlap. Social Security Number 123-45-6789 is different from ISBN 123456789 which is different from barcode 123456789.

Is GUID really unique?

A GUID is a unique number that can be used as an identifier for anything in the universe, but unlike ISBN there is no central authority - the uniqueness of a GUID relies on the algorthm that was used to generate it.

What is a GUID file?

A GUID (globally unique identifier) is a 128-bit text string that represents an identification (ID). Organizations generate GUIDs when a unique reference number is needed to identify information on a computer or network. A GUID can be used to ID hardware, software, accounts, documents and other items.

What are the odds of a GUID repeating?

GUID generation algorithm 4 fills the GUID with 122 random bits. The odds of two GUIDs colliding are therefore one in 2¹²², which is a phenomenally small number. When you are dealing with rates this low, you have to adjust your frame of reference.


1 Answers

Yes. But probably a much more convenient scheme would use a hash-sum (say the MD5-sum) of the contents.

That way,

  • the generation of the filename is repeatable (in case something goes wrong, data needs to be migrated to a different server, content is shared across different intallations etc).
  • you'd automatically share duplicate uploads. Of course, then you'd need to track who owns the file (and not delete it untill the last usage is deleted)

Note An example of a typical md5sum is 5eb63bbbe01eeed093cb22bb8f5acdc3 (for ASCII/UTF8 "hello world")

Edit in response to the comments (about hash collisions): True enough, you might get hash collisions with very large sets of documents. In that case, it is most common to use the hash sum + the length of a file to identify the 'content blob'. So you'd do something like:

 http://cms.mysite.local/docs/123986/5e/b63bbbe/01eeed093cb22bb8f5acdc3.png

for a png of length ~ 123Kb

like image 113
sehe Avatar answered Oct 06 '22 01:10

sehe