There are http.DetectContentType([]byte)
function in net/http
package. But only limited number of types are supported. How to add support of docx
, doc
, xls
, xlsx
, ppt
, pps
, odt
, ods
, odp
files not by extension, but by the content.
As far as I know, there are some problems, because docx
/xlsx
/pptx
/odp
/odt
files has the same signature as the zip
file (50 4B 03 04).
Disclaimer: I'm the author of mimetype.
For anyone having the same problem 3 years later, nowadays the packages for mime type detection based on the content are the following:
filetype
magicmime
man magic
mimetype
filetype
For files with x
at the end are relatively easy to detect. Just unzip it and read .rels/_rels
file. It contains path to the main file in document. It denoted by namespace http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument
. Just check its name. It's document.xml
for docx, workbook.xml
for xlsx and presentation.xml
for pptx.
More info here can be found here ECMA-376.
Binary formats harder to detect. Basically you need to read MS-CFB filesystem and check for entries:
WordDocument
for docWorkbook
or Book
for xlsPowerPoint Document
for pptEncryptedPackage
means file is encrypted.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With