Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to FindMimeFromData method in Urlmon.dll one which has more MIME types

Tags:

mime-types

c#

The FindMimeFromData method accessible through Windows DLL Urlmon.dll is capable of determining the MIME type of a given data stored in memory, considering the first 256 bytes of the byte array, where such data is stored.

However after reading its documentation, I was lead to MIME Type Detection in Windows Internet Explorer where I could find the MIME types this method is able to recognize. See list. As you can see, this method is limited to 26 MIME types.

So I was wondering if anyone could point me to another method with more MIME types, or alternatively another method / class were I would be able to include the MIME types I see fit.

like image 592
Fábio Antunes Avatar asked Mar 08 '13 18:03

Fábio Antunes


2 Answers

UPDATE: @GetoX has taken this code and wrapped it in a NuGet package for .net core! See below, cheers!!

So I was wondering if anyone could point me to another method with more MIME types, or alternatively another method / class were I would be able to include the MIME types I see fit.

I use a hybrid of Winista and URLMon to detect the real format of files uploaded..

Winista MIME Detection

Say someone renames a exe with a jpg extension, you can still determine the "real" file format using Binary Analysis. It doesn't detect swf's or flv's but does pretty much every other well known format + you can get a hex editor and add more files it can detect.

File Magic

Winista detects the real MIME type using an XML file "mime-type.xml" that contains information about file types and the signatures used to identify the content type.eg:

<!--
 !   Audio primary type
 ! -->

<mime-type name="audio/basic"
           description="uLaw/AU Audio File">
    <ext>au</ext><ext>snd</ext>
    <magic offset="0" type="byte" value="2e736e64000000"/>
</mime-type>

<mime-type name="audio/midi"
           description="Musical Instrument Digital Interface MIDI-sequention Sound">
    <ext>mid</ext><ext>midi</ext><ext>kar</ext>
    <magic offset="0" value="MThd"/>
</mime-type>

<mime-type name="audio/mpeg"
           description="MPEG Audio Stream, Layer III">
    <ext>mp3</ext><ext>mp2</ext><ext>mpga</ext>
    <magic offset="0" value="ID3"/>
</mime-type>

When Winista fail's to detect the real file format, I've resorted back to the URLMon method:

public class urlmonMimeDetect
{
    [DllImport(@"urlmon.dll", CharSet = CharSet.Auto)]
    private extern static System.UInt32 FindMimeFromData(
        System.UInt32 pBC,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzUrl,
        [MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
        System.UInt32 cbSize,
        [MarshalAs(UnmanagedType.LPStr)] System.String pwzMimeProposed,
        System.UInt32 dwMimeFlags,
        out System.UInt32 ppwzMimeOut,
        System.UInt32 dwReserverd
    );

public string GetMimeFromFile(string filename)
{
    if (!File.Exists(filename))
        throw new FileNotFoundException(filename + " not found");

    byte[] buffer = new byte[256];
    using (FileStream fs = new FileStream(filename, FileMode.Open, FileAccess.Read))
    {
        if (fs.Length >= 256)
            fs.Read(buffer, 0, 256);
        else
            fs.Read(buffer, 0, (int)fs.Length);
    }
    try
    {
        System.UInt32 mimetype;
        FindMimeFromData(0, null, buffer, 256, null, 0, out mimetype, 0);
        System.IntPtr mimeTypePtr = new IntPtr(mimetype);
        string mime = Marshal.PtrToStringUni(mimeTypePtr);
        Marshal.FreeCoTaskMem(mimeTypePtr);
        return mime;
    }
    catch (Exception e)
    {
        return "unknown/unknown";
    }
}
}

From inside the Winista method, I fall back on the URLMon here:

   public MimeType GetMimeTypeFromFile(string filePath)
    {
        sbyte[] fileData = null;
        using (FileStream srcFile = new FileStream(filePath, FileMode.Open, FileAccess.Read))
        {
            byte[] data = new byte[srcFile.Length];
            srcFile.Read(data, 0, (Int32)srcFile.Length);
            fileData = Winista.Mime.SupportUtil.ToSByteArray(data);
        }

        MimeType oMimeType = GetMimeType(fileData);
        if (oMimeType != null) return oMimeType;

        //We haven't found the file using Magic (eg a text/plain file)
        //so instead use URLMon to try and get the files format
        Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect urlmonMimeDetect = new Winista.MimeDetect.URLMONMimeDetect.urlmonMimeDetect();
        string urlmonMimeType = urlmonMimeDetect.GetMimeFromFile(filePath);
        if (!string.IsNullOrEmpty(urlmonMimeType))
        {
            foreach (MimeType mimeType in types)
            {
                if (mimeType.Name == urlmonMimeType)
                {
                    return mimeType;
                }
            }
        }

        return oMimeType;
    }

Wayback Machine link to the Winista utility from netomatix. AFAIK they found some "mime reader utility classes in open source Nutch crawler system" and they did a C# rewrite in the early 2000's.

I've hosted my MimeDetect project using Winista and the URLMon fall back here (please contribute new file types using a Hex editor): https://github.com/MeaningOfLights/MimeDetect

You could also use the Registry method or .Net 4.5 method mentioned in this post linked to by Paul Zahra, but Winista is the best IMHO.

Enjoy knowing files on your systems are what they claim to be and not laden with malware!


UPDATE:

For desktop applications you may find the WindowsAPICodePack works better:

using Microsoft.WindowsAPICodePack.Shell;
using Microsoft.WindowsAPICodePack.Shell.PropertySystem;

private static string GetFilePropertyItemTypeTextValueFromShellFile(string filePathWithExtension)
{
   var shellFile = ShellFile.FromFilePath(filePathWithExtension);
   var prop = shellFile.Properties.GetProperty(PItemTypeTextCanonical);
   return prop.FormatForDisplay(PropertyDescriptionFormatOptions.None);
}
like image 159
Jeremy Thompson Avatar answered Sep 28 '22 02:09

Jeremy Thompson


After few hours of looking for elastic solution. I took @JeremyThompson solution, adapted it to frameworks .net core/.net 4.5 and put it into nuget package.

   //init
   var mimeTypes = new MimeTypes();

   //usage by filepath
   var mimeType1 = mimeTypes.GetMimeTypeFromFile(filePath);

   //usage by bytearray
   var mimeType2 = mimeTypes.GetMimeTypeFromFile(bytes);
like image 34
GetoX Avatar answered Sep 28 '22 02:09

GetoX