Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is mime type of an uploaded file determined by browser?

I have a web app where the user needs to upload a .zip file. On the server-side, I am checking the mime type of the uploaded file, to make sure it is application/x-zip-compressed or application/zip.

This worked fine for me on Firefox and IE. However, when a coworker tested it, it failed for him on Firefox (sent mime type was something like "application/octet-stream") but worked on Internet Explorer. Our setups seem to be identical: IE8, FF 3.5.1 with all add-ons disabled, Win XP SP3, WinRAR installed as native .zip file handler (not sure if that's relevant).

So my question is: How does the browser determine what mime type to send?

Please note: I know that the mime type is sent by the browser and, therefore, unreliable. I am just checking it as a convenience--mainly to give a more friendly error message than the ones you get by trying to open a non-zip file as a zip file, and to avoid loading the (presumably heavy) zip file libraries.

like image 308
Kip Avatar asked Jul 29 '09 17:07

Kip


People also ask

How is file MIME type determined?

MIME types are defined by three attributes: language (lang), encoding (enc), and content-type (type). At least one of these attributes must be present for each type. The most commonly used attribute is type. The server frequently considers the type when deciding how to generate the response to the client.

What is a MIME type and how does a Web server make use of it?

MIME types describe the media type of content, either in email, or served by web servers or web applications. They are intended to help provide a hint as to how the content should be processed and displayed. Examples of MIME types: text/html for HTML documents.

Is MIME used in web browsing?

MIME stands for "Multipurpose Internet Mail Extensions". The concept of MIME Types was originally created to be used with email programs; hence the "Mail" part of the acronym. MIME Types are now also used by web browsers.

Who set MIME type of file?

MIME types are defined and standardized in IETF's RFC 6838. The Internet Assigned Numbers Authority (IANA) is responsible for all official MIME types, and you can find the most up-to-date and complete list at their Media Types page.


2 Answers

Chrome

Chrome (version 38 as of writing) has 3 ways to determine the MIME type and does so in a certain order. The snippet below is from file src/net/base/mime_util.cc, method MimeUtil::GetMimeTypeFromExtensionHelper.

// We implement the same algorithm as Mozilla for mapping a file extension to // a mime type.  That is, we first check a hard-coded list (that cannot be // overridden), and then if not found there, we defer to the system registry. // Finally, we scan a secondary hard-coded list to catch types that we can // deduce but that we also want to allow the OS to override. 

The hard-coded lists come a bit earlier in the file: https://cs.chromium.org/chromium/src/net/base/mime_util.cc?l=170 (kPrimaryMappings and kSecondaryMappings).

An example: when uploading a CSV file from a Windows system with Microsoft Excel installed, Chrome will report this as application/vnd.ms-excel. This is because .csv is not specified in the first hard-coded list, so the browser falls back to the system registry. HKEY_CLASSES_ROOT\.csv has a value named Content Type that is set to application/vnd.ms-excel.

Internet Explorer

Again using the same example, the browser will report application/vnd.ms-excel. I think it's reasonable to assume Internet Explorer (version 11 as of writing) uses the registry. Possibly it also makes use of a hard-coded list like Chrome and Firefox, but its closed source nature makes it hard to verify.

Firefox

As indicated in the Chrome code, Firefox (version 32 as of writing) works in a similar way. Snippet from file uriloader\exthandler\nsExternalHelperAppService.cpp, method nsExternalHelperAppService::GetTypeFromExtension

// OK. We want to try the following sources of mimetype information, in this order: // 1. defaultMimeEntries array // 2. User-set preferences (managed by the handler service) // 3. OS-provided information // 4. our "extras" array // 5. Information from plugins // 6. The "ext-to-type-mapping" category 

The hard-coded lists come earlier in the file, somewhere near line 441. You're looking for defaultMimeEntries and extraMimeEntries.

With my current profile, the browser will report text/csv because there's an entry for it in mimeTypes.rdf (item 2 in the list above). With a fresh profile, which does not have this entry, the browser will report application/vnd.ms-excel (item 3 in the list).

Summary

The hard-coded lists in the browsers are pretty limited. Often, the MIME type sent by the browser will be the one reported by the OS. And this is exactly why, as stated in the question, the MIME type reported by the browser is unreliable.

like image 76
user247702 Avatar answered Sep 22 '22 16:09

user247702


Kip, I spent some time reading RFCs, MSDN and MDN. Here is what I could understand. When a browser encounters a file for upload, it looks at the first buffer of data it receives and then runs a test on it. These tests try to determine if the file is a known mime type or not, and if known mime type it will simply further test it for which known mime type and take action accordingly. I think IE tries to do this first rather than just determining the file type from extension. This page explains this for IE http://msdn.microsoft.com/en-us/library/ms775147%28v=vs.85%29.aspx. For firefox, what I could understand was that it tries to read file info from filesystem or directory entry and then determines the file type. Here is a link for FF https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIFile. I would still like to have more authoritative info on this.

like image 25
Kumar Avatar answered Sep 20 '22 16:09

Kumar