Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I determine a file's true extension/type programmatically?

I am working on a script that will process user uploads to the server, and as an added layer of security I'd like to know:

Is there a way to detect a file's true extension/file type, and ensure that it is not another file type masked with a different extension?

Is there a byte stamp or some unique identifier for each type/extension?

I'd like to be able to detect that someone hasn't applied a different extension onto the file they are uploading.

like image 701
barfoon Avatar asked Jan 26 '09 22:01

barfoon


People also ask

How do you recognize file types and extensions?

Windows file names have two parts; the file's name, then a period followed by the extension (suffix). The extension is a three- or four-letter abbreviation that signifies the file type. For example, in letter. docx the filename is letter and the extension is docx.

What command is used to determine the data format type of a file's contents?

We can also use ls command to determine a type of file.

How can you identify the file type without opening it?

You should first confirm if your file actually has no extension, or you just don't have the program needed to open a particular file type. You can check the file extension from the Type column in Windows file explorer. Alternatively, you could right-click on the file and select Properties.


2 Answers

Not really, no.

You will need to read the first few bytes of each file and interpret it as a header for a finite set of known filetypes. Most files have distinct file headers, some sort of metadata in the first few bytes or first few kilobytes in the case of MP3.

Your program will have to simply try parsing the file for each of your accepted filetypes.

For my program, I send the uploaded image to imagemagick in a try-catch block, and if it blows up, then I guess it was a bad image. This should be considered insecure, because I am loading arbitrary (user supplied) binary data into an external program, which is generally an attack vector. here, I am trusting imageMagick to not do anything to my system.

I recommend writing your own handlers for the significant filetypes you intend to use, to avoid any attack vectors.

Edit: I see in PHP there are some tools to do this for you.

Also, MIME types are what the user's browser claims the file to be. It is handy and useful to read those and act on them in your code, but it is not a secure method, because anyone sending you bad files will fake the MIME headers easily. It's sort of a front line defense to keep your code that expects a JPEG from barfing on a PNG, but if someone embedded a virus in a .exe and named it JPEG, there's no reason not to have spoofed the MIME type.

like image 121
Karl Avatar answered Oct 03 '22 20:10

Karl


PHP has a couple of ways of reading file contents to determine its MIME type, depending on which version of PHP you are using:

Have a look at the Fileinfo functions if you're running PHP 5.3+

$finfo = finfo_open(FILEINFO_MIME); 
$type = finfo_file($finfo, $filepath);
finfo_close($finfo);  

Alternatively, check out mime_content_type for older versions.

$type = mime_content_type($filepath);

Note that just validating the file type isn't enough if you want to be truly secure. Someone could, for example, upload a valid JPEG file which exploits a vulnerability in a common renderer. To guard against this, you would need a well maintained virus scanner.

like image 43
Paul Dixon Avatar answered Oct 03 '22 18:10

Paul Dixon