Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use File Content to Determine MIME Type with Node JS

It seems all of the popular MIME type libraries for node.js just use the file name extension rather than peeking into the file to determine the MIME type.

Is there a good way to use Node to jump into the file and intelligently determine the file's MIME type in case an extension is not present?

like image 604
Kirk Ouimet Avatar asked Jul 09 '14 20:07

Kirk Ouimet


2 Answers

That indeed feels like a pity, that most popular MIME modules are just mapping extension to the type.

After searching deeper, I found the module called mmmagic, it seems to be doing exactly what you want.

Be aware, that from working with MIME I was left with a taste, that MIME detection is in principle not completely reliable, and there is a rare chance of false detections.

Example of usage (taken from their site):

  var mmm = require('mmmagic'),
      Magic = mmm.Magic;

  var magic = new Magic(mmm.MAGIC_MIME_TYPE);
  magic.detectFile('node_modules/mmmagic/build/Release/magic.node', function(err, result) {
      if (err) throw err;
      console.log(result);
      // output on Windows with 32-bit node:
      //    application/x-dosexec
  });
like image 171
alandarev Avatar answered Sep 21 '22 10:09

alandarev


Since MIME does not at all dictate anything about the file contents format, you can only employ heuristics to guess what is going on in a file:

  1. Some binary formats have something called a magic number, but those can be wrong or ambiguous. See this wikipedia article for more info.

  2. Many text file formats contain grammar constructs that you can use for a simple pattern matching test. E.g. xml, csv or json. However some formats (e.g. HTML), have a rather "evolved" syntax definition making it ambiguous and thus hard to pattern match.

To better illustrate the issue of ambiguity, here is an example: Browsers have developed a very very high tolerance, and accept anything that remotely resembles HTML thus a HTML (or even XHTML) file format is hard to identify. Not to mention the fact that HTML files could actually be non-HTML template languages (such as jade, handlebars, angular templates etc...). This is just one of many examples where things get very ambiguous.

like image 21
Domi Avatar answered Sep 18 '22 10:09

Domi