Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should uploaded files be renamed?

I've been reading up on PHP file upload security and a few articles have recommended renaming the files. For example, the OWASP article Unrestricted File Upload says:

It is recommended to use an algorithm to determine the filenames. For instance, a filename can be a MD5 hash of the name of file plus the date of the day.

If a user uploads a file named Cake Recipe.doc is there really any reason to rename it to 45706365b7d5b1f35?

If the answer is yes, for whatever reason, then how do you keep track of the original file name and extension?

like image 492
Nate Avatar asked Jul 25 '13 18:07

Nate


People also ask

Does renaming a file affect it?

Renaming a file -- even changing the file extension -- should not affect the content of the file.

Why do files need to be renamed?

Answer: Renaming your files gives you that chance to look at the file again, whether it is in iPhoto, Windows Media Gallery, wherever. 4) File names are picked up by search engines.

What happens when you rename a file?

When you rename a file, only the first part of the name of the file is selected, not the file extension (the part after the last .). The extension normally denotes what type of file it is (for example, file.

How do I rename a file while uploading?

We can change file name by using FileUpload control's SaveAs method. SaveAs() method require to pass a parameter named filename. This parameter value is a string that specifies the full path of the location of the server on which to save the uploaded file.


1 Answers

To your primary question, is it good practice to rename files, the answer is a definite yes, especially if you are creating a form of File Repository where users upload files (and filenames) of their choosing, for several reason:

  1. Security - if you have a poorly written application that allows the download of files by name or through direct access (it's a horrid, but it happens), it's much harder for a user, whether maliciously or on purpose, to "guess" the names of files.
  2. Uniqueness -- the likelihood of two different people uploading a file of the same name is very high (ie. avatar.gif, readme.txt, video.avi, etc). The use of a unique identifier significantly decreases the likelihood that two files will be of the same name.
  3. Versioning -- It is much easier to keep multiple "versions" of a document using unique names. It also avoids the need for additional code to parse a filename to make changes. A simple example would document.pdf to document(1).pdf, which becomes more complicated when you don't underestimate users abilities to create horrible names for things.
  4. Length -- working with known filename lengths is always better than working with unknown filename lengths. I can always know that (my filepath) + (X letters) is a certain length, where (my filepath) + (random user filename) is completely unknown.
  5. OS -- the length above can also create problems when attempting to write extremely random/long filenames to a drive. You have to account for special characters, lengths and the concerns for trimmed filenames (user may not receive a working file because the extension has been trimmed).
  6. Execution -- It's easy for the OS to execute a file named .exe, or .php, or (insert other extension). It's hard when there isn't an extension.
  7. URL encoding -- Ensuring the name is URL safe. Cake Recipe.doc is not a URL safe name, and can on some systems (either server or browser side) / some situations, cause inconsistencies when the name should be a urlencoded value.

As for storing the information, you would typically do this in a database, no different than the need you have already, since you need a way to refer back to the file (who uploaded, what the name is, occassionally where it is stored, the time of upload, sometimes the size). You're simply adding to that the actual stored name of the file in addition to the user's name for the file.

The OWASP recommendation isn't a bad one -- using the filename and a timestamp (not date) would be mostly unique. I take it a step further to include the microtime with the timestamp, and often some other unique bit of information, so that a duplicate upload of a small file couldn't occur in the same timeframe -- I also store the date of the upload which is additional insurance against md5 clashes, which has a higher probability in systems that store many files and for years. It is incredibly unlikely that you would generate two like md5s, using filename and microtime, on the same day. An example would be:

$filename = date('Ymd') . '_' . md5($uploaded_filename . microtime());

My 2 cents.

like image 175
Jacob S Avatar answered Oct 06 '22 09:10

Jacob S