A MIME type has two parts: a type and a subtype. They are separated by a slash (/). For example, the MIME type for Microsoft Word files is application and the subtype is msword. Together, the complete MIME type is application/msword.
For detecting MIME-types, use the aptly named "mimetype" command. It has a number of options for formatting the output, it even has an option for backward compatibility to "file". But most of all, it accepts input not only as file, but also via stdin/pipe, so you can avoid temporary files when processing streams.
The thing about MIME types is they're almost entirely fictional.
MIME and HTTP ask us to pretend that all of our files have a piece of metadata identifying the "content type". When we send files around the network, the "content type" metadata goes with them, so nobody ever misinterprets the content of a file.
The truth is this metadata doesn't exist. By the time MIME was invented, it was really too late to convince any OS vendors to adopt a new type system for files. Unix had settled on magic numbers, DOS had settled on 3-letter filename suffixes, and classic MacOS had its creator codes and type codes. (MacOS type codes were closest to the MIME model, since they actually were separate from both the filename and the content. But being only 4 letters long, MIME types wouldn't fit.)
Nobody stores MIME-compatible content types in their filesystem. When a MIME message composer or HTTP server wants to send a file, it decides the file type in the traditional way (filename suffix and/or magic number) and maps the result to a MIME type.
In contrast to the theory (where MIME eliminates file type guessing), MIME as implemented in practice has moved the "guess file type based on filename suffix and/or magic number" logic from the receiver of the file to the sender. As you have noticed, the sender doesn't usually do a better job than the receiver would have done if forced to figure it out for itself. Frequently in the case of a web server, the server's eagerness to slap a Content-type
on a file makes things worse. There's no reason for a web server to know anything about the format of files it serves when it is only being used to distribute them and has no need to interpret their contents.
The file
command guesses file type by reading the content and looking for magic numbers and strings. The -I
option doesn't change that. It just chooses a different output format.
To change the Content-Type
header that a web server sends for a specific file, you should be looking in your web server's configuration manual. There's nothing you can do to the file itself.
It's a bit of a category mistake to talk about âthe MIME type of a fileâ â âfilesâ don't have MIME types; only octet streams have them (I'm not necessarily disagreeing with @wumpus-q-wumbley's description of MIME types as âfictionalâ, but this is another way of thinking about it).
MIME stands for Multipurpose Internet Mail Extensions, as originally described in in RFC 2045, and MIME types were originally intended to describe what a receiver is supposed to do with the bunch of bytes soon to follow down the wire, in the rest of the email message. They were very naturally repurposed in (for example) the HTTP protocol, to let a client understand how it is to interpret the bytes in the HTTP response which this MIME type forms the header of.
The fact that the file
command can display a MIME type suggests the further extension of the idea, to act as the key which lets a windowing system look up the name of an application which should be used to open the file.
Thus, if âthe MIME type of a fileâ means anything, it means âthe MIME type which a web server would prefix to this file if it were to be delivered in response to an HTTP requestâ (or something like that). Thought of like that, it's clear that the MIME type is part of the web server's configuration, and not anything intrinsic to the file â a single file might be delivered with various MIME types depending on the URL which retrieves it, and details of the request and configuration. Thus an XHTML file might be delivered as text/html
or application/xml
or application/octet-stream
depending on the details of the HTTP request, the directory the file's located in, or indeed the phase of the moon (the latter would be an unhelpful server configuration).
A web server might have a number of mechanisms for deciding on this MIME type, which might include a lookup table based on any file extension, a .htaccess
file, or indeed the output of the file
command.
So the answer to your question is: it depends.
/etc/mime.types
file (if your system uses that and if the server is configured to fall back on that).file
command specifically, for some other reason, then man file
is your friend, and you'll probably need to grub around in the magic numbers file, reasonably carefully.If you have a pdf, and the $file --mime-type
command answer octet-stream
and not application/pdf
, you have a corruption in your file.
The pdf readers will read it, and ignore the problem, but if you upload this file to a web application, the application will recognize the mime-type as a octet-sream. Sometimes it is a problem, mainly if you validate the mime-type (I sometimes have this problem in my application).
To get a fast solution, use a ghost script like this:
gs -o new.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress old.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With