Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find out the real file type

I am working on an ASP web page that handles file uploads. Only certain types of files are allowed to be uploaded, like .XLS, .XML, .CSV, .TXT, .PDF, .PPT, etc.

I have to decide if a file really has the same type as the extension shows. In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file.

What techniques would you use to analyze these uploaded files? Where can I get the best information about the format of these files?

like image 964
Germstorm Avatar asked Jan 16 '09 16:01

Germstorm


People also ask

How do I find the actual file type?

Right-click the file. Select the Properties option. In the Properties window, similar to what is shown below, see the Type of file entry, which is the file type and extension.

What is true file type?

When set to scan the “true file type”, the scan engine examines the file header, rather than the file name, to ascertain the actual file type. For example, if the scan engine is set to scan all executable files and it encounters a file named “family. gif”, it does not assume the file is a graphic file.

What are the 3 types of files?

The types of files recognized by the system are either regular, directory, or special. However, the operating system uses many variations of these basic types. All file types recognized by the system fall into one of these categories. However, the operating system uses many variations of these basic types.


2 Answers

One way would be to check for certain signatures or magic numbers in the files. This page has a handy list of known file signatures and seems quite up to date:

http://www.garykessler.net/library/file_sigs.html

like image 109
Kev Avatar answered Sep 29 '22 01:09

Kev


In other words if a trojan.exe was renamed to harmless.pdf and uploaded, the application must be able to find out that the uploaded file is NOT a .PDF file.

That's not really a problem. If a .exe was uploaded as a .pdf and you correctly served it back up to the downloader as application/pdf, all the downloader would get would be a broken PDF. They would have to manually retype it to .exe to get harmed.

The real problems are:

  1. Some browsers may sniff the content of the file and decide they know better than you about what type of file it is. IE is particularly bad at this, tending to prefer to render the file as HTML if it sees any HTML tags lurking near the start of the file. This is particulary unhelpful as it means script can be injected onto your site, potentially compromising any application-level security (cookie stealing et al). Workarounds include always serving the file as an attachment using Content-Disposition, and/or serving files from a different hostname, so it can't cross-site-script back onto your main site.

  2. PDF files are not safe anyway! They can be full of scripting, and have had significant security holes. Exploitation of a hole in the PDF reader browser plugin is currently one of the most common means of installing trojans on the web. And there's almost nothing you can usually do to try to detect the exploits as they can be highly obfuscated.

like image 27
bobince Avatar answered Sep 29 '22 00:09

bobince