Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Windows File Properties (Custom Properties) C#

In Word/Excel you have to possibility to add Custom properties. (See Image) Custom Properties. As you guys can see there is the field: "Properties:", you can add any information you want there. When you save the file and you go to the file location in the folder, you can right click -> Properties and you have all the tabs: General/Security/Details/Previous Versions. with the feature you add the tab Custom.

Now I want to get this information through coding: Custom Properties information. and extract it later to notepad. So far i used the Shell32 but then I only get the information that is in the Details tab. I did some research and saw some possibilities with DSOfile.dll. But I want to know if there is a possibility to do this without installing other DLL? This is my code so far with the Shell32.

        static void Main(string[] args)
    {

        //using (StreamWriter writer = new StreamWriter(@"filepathhere"))
        //{
            //Console.SetOut(writer);
            ReadProperties();
        //}
    }
    static void ReadProperties()
    { 
        List<string> arrHeaders = new List<string>();
        Shell shell = new Shell();
        Folder objFolder = shell.NameSpace(@"filepathhere");
        FolderItem objFolderItem = objFolder.ParseName("filehere.doc");

        for (int i = 0; i < short.MaxValue; i++)
        {
            string header = objFolder.GetDetailsOf(objFolder, i);
            if (String.IsNullOrEmpty(header))
                break;
            arrHeaders.Add(header);
        }

        for ( int i = 0; i < arrHeaders.Count; i++)
        {
            Console.WriteLine("{0}\t{1}: {2}", i, arrHeaders[i], objFolder.GetDetailsOf(objFolderItem, i));
        }
        Console.ReadKey();
    }

Thanks in advance!

Desu

like image 658
Desutoroiya Avatar asked Feb 17 '16 08:02

Desutoroiya


3 Answers

Historically these properties were defined by the technology called "Structured Storage". The first Structured Storage implementation is the ancient (but still very alive) Compound File Format

After that, Microsoft added Structured Storage capabilities directly into NTFS. This allows you to define properties like author or title on any files (even .txt) files. Although the Explorer UI does not let you do this anymore for some reason, I think it still works programmatically.

And then, with Vista, Microsoft rebooted all that and introduced a superset of all this: the Windows Property System.

Unfortunately, there is no .NET API in the framework for all this. But Microsoft created an open source .NET library called the Windows API CodePack. So, the easiest way for you to extract properties is to add a reference to the WindowsAPICodePack Core NugetPackage and you can use it like this:

static void Main(string[] args)
{
    foreach (var prop in new ShellPropertyCollection(@"mypath\myfile"))
    {
        Console.WriteLine(prop.CanonicalName + "=" + prop.ValueAsObject);
    }
}

If you don't want to add extra DLLs, then you can extract the ShellPropertyCollection code from the WindowsAPICodePack source (Windows API Code Pack: Where is it?). It's quite a work but doable.

Another solution in your case, is to use the old Structure Storage native API. I've provided a sample that does this. Here is how you can use it:

static void Main(string[] args)
{
    foreach (var prop in new StructuredStorage(@"mypath\myfile").Properties)
    {
        Console.WriteLine(prop.Name + "=" + prop.Value);
    }            
}

public sealed class StructuredStorage
{
    public static readonly Guid SummaryInformationFormatId = new Guid("{F29F85E0-4FF9-1068-AB91-08002B27B3D9}");
    public static readonly Guid DocSummaryInformationFormatId = new Guid("{D5CDD502-2E9C-101B-9397-08002B2CF9AE}");
    public static readonly Guid UserDefinedPropertiesId = new Guid("{D5CDD505-2E9C-101B-9397-08002B2CF9AE}");

    private List<StructuredProperty> _properties = new List<StructuredProperty>();

    public StructuredStorage(string filePath)
    {
        if (filePath == null)
            throw new ArgumentNullException("filePath");

        FilePath = filePath;
        IPropertySetStorage propertySetStorage;
        int hr = StgOpenStorageEx(FilePath, STGM.STGM_READ | STGM.STGM_SHARE_DENY_NONE | STGM.STGM_DIRECT_SWMR, STGFMT.STGFMT_ANY, 0, IntPtr.Zero, IntPtr.Zero, typeof(IPropertySetStorage).GUID, out propertySetStorage);
        if (hr == STG_E_FILENOTFOUND || hr == STG_E_PATHNOTFOUND)
            throw new FileNotFoundException(null, FilePath);

        if (hr != 0)
            throw new Win32Exception(hr);

        try
        {
            LoadPropertySet(propertySetStorage, SummaryInformationFormatId);
            LoadPropertySet(propertySetStorage, DocSummaryInformationFormatId);
        }
        finally
        {
            Marshal.ReleaseComObject(propertySetStorage);
        }

        // for some reason we can't read this one on the same COM ref?
        LoadProperties(UserDefinedPropertiesId);
    }

    public string FilePath { get; private set; }
    public IReadOnlyList<StructuredProperty> Properties
    {
        get
        {
            return _properties;
        }
    }

    private void LoadPropertySet(IPropertySetStorage propertySetStorage, Guid fmtid)
    {
        IPropertyStorage propertyStorage;
        int hr = propertySetStorage.Open(fmtid, STGM.STGM_READ | STGM.STGM_SHARE_EXCLUSIVE, out propertyStorage);
        if (hr == STG_E_FILENOTFOUND || hr == STG_E_ACCESSDENIED)
            return;

        if (hr != 0)
            throw new Win32Exception(hr);

        IEnumSTATPROPSTG es;
        propertyStorage.Enum(out es);
        if (es == null)
            return;

        try
        {
            var stg = new STATPROPSTG();
            int fetched;
            do
            {
                hr = es.Next(1, ref stg, out fetched);
                if (hr != 0 && hr != 1)
                    throw new Win32Exception(hr);

                if (fetched == 1)
                {
                    string name = GetPropertyName(fmtid, propertyStorage, stg);

                    var propsec = new PROPSPEC[1];
                    propsec[0] = new PROPSPEC();
                    propsec[0].ulKind = stg.lpwstrName != null ? PRSPEC.PRSPEC_LPWSTR : PRSPEC.PRSPEC_PROPID;
                    IntPtr lpwstr = IntPtr.Zero;
                    if (stg.lpwstrName != null)
                    {
                        lpwstr = Marshal.StringToCoTaskMemUni(stg.lpwstrName);
                        propsec[0].union.lpwstr = lpwstr;
                    }
                    else
                    {
                        propsec[0].union.propid = stg.propid;
                    }

                    var vars = new PROPVARIANT[1];
                    vars[0] = new PROPVARIANT();
                    try
                    {
                        hr = propertyStorage.ReadMultiple(1, propsec, vars);
                        if (hr != 0)
                            throw new Win32Exception(hr);

                    }
                    finally
                    {
                        if (lpwstr != IntPtr.Zero)
                        {
                            Marshal.FreeCoTaskMem(lpwstr);
                        }
                    }

                    object value;
                    try
                    {
                        switch (vars[0].vt)
                        {
                            case VARTYPE.VT_BOOL:
                                value = vars[0].union.boolVal != 0 ? true : false;
                                break;

                            case VARTYPE.VT_BSTR:
                                value = Marshal.PtrToStringUni(vars[0].union.bstrVal);
                                break;

                            case VARTYPE.VT_CY:
                                value = decimal.FromOACurrency(vars[0].union.cyVal);
                                break;

                            case VARTYPE.VT_DATE:
                                value = DateTime.FromOADate(vars[0].union.date);
                                break;

                            case VARTYPE.VT_DECIMAL:
                                IntPtr dec = IntPtr.Zero;
                                Marshal.StructureToPtr(vars[0], dec, false);
                                value = Marshal.PtrToStructure(dec, typeof(decimal));
                                break;

                            case VARTYPE.VT_DISPATCH:
                                value = Marshal.GetObjectForIUnknown(vars[0].union.pdispVal);
                                break;

                            case VARTYPE.VT_ERROR:
                            case VARTYPE.VT_HRESULT:
                                value = vars[0].union.scode;
                                break;

                            case VARTYPE.VT_FILETIME:
                                value = DateTime.FromFileTime(vars[0].union.filetime);
                                break;

                            case VARTYPE.VT_I1:
                                value = vars[0].union.cVal;
                                break;

                            case VARTYPE.VT_I2:
                                value = vars[0].union.iVal;
                                break;

                            case VARTYPE.VT_I4:
                                value = vars[0].union.lVal;
                                break;

                            case VARTYPE.VT_I8:
                                value = vars[0].union.hVal;
                                break;

                            case VARTYPE.VT_INT:
                                value = vars[0].union.intVal;
                                break;

                            case VARTYPE.VT_LPSTR:
                                value = Marshal.PtrToStringAnsi(vars[0].union.pszVal);
                                break;

                            case VARTYPE.VT_LPWSTR:
                                value = Marshal.PtrToStringUni(vars[0].union.pwszVal);
                                break;

                            case VARTYPE.VT_R4:
                                value = vars[0].union.fltVal;
                                break;

                            case VARTYPE.VT_R8:
                                value = vars[0].union.dblVal;
                                break;

                            case VARTYPE.VT_UI1:
                                value = vars[0].union.bVal;
                                break;

                            case VARTYPE.VT_UI2:
                                value = vars[0].union.uiVal;
                                break;

                            case VARTYPE.VT_UI4:
                                value = vars[0].union.ulVal;
                                break;

                            case VARTYPE.VT_UI8:
                                value = vars[0].union.uhVal;
                                break;

                            case VARTYPE.VT_UINT:
                                value = vars[0].union.uintVal;
                                break;

                            case VARTYPE.VT_UNKNOWN:
                                value = Marshal.GetObjectForIUnknown(vars[0].union.punkVal);
                                break;

                            default:
                                value = null;
                                break;
                        }
                    }
                    finally
                    {
                        PropVariantClear(ref vars[0]);
                    }

                    var property = new StructuredProperty(fmtid, name, stg.propid);
                    property.Value = value;
                    _properties.Add(property);
                }
            }
            while (fetched == 1);
        }
        finally
        {
            Marshal.ReleaseComObject(es);
        }
    }

    private static string GetPropertyName(Guid fmtid, IPropertyStorage propertyStorage, STATPROPSTG stg)
    {
        if (!string.IsNullOrEmpty(stg.lpwstrName))
            return stg.lpwstrName;

        var propids = new int[1];
        propids[0] = stg.propid;
        var names = new string[1];
        names[0] = null;
        int hr = propertyStorage.ReadPropertyNames(1, propids, names);
        if (hr == 0)
            return names[0];

        return null;
    }

    public void LoadProperties(Guid formatId)
    {
        IPropertySetStorage propertySetStorage;
        int hr = StgOpenStorageEx(FilePath, STGM.STGM_READ | STGM.STGM_SHARE_DENY_NONE | STGM.STGM_DIRECT_SWMR, STGFMT.STGFMT_ANY, 0, IntPtr.Zero, IntPtr.Zero, typeof(IPropertySetStorage).GUID, out propertySetStorage);
        if (hr == STG_E_FILENOTFOUND || hr == STG_E_PATHNOTFOUND)
            throw new FileNotFoundException(null, FilePath);

        if (hr != 0)
            throw new Win32Exception(hr);

        try
        {
            LoadPropertySet(propertySetStorage, formatId);
        }
        finally
        {
            Marshal.ReleaseComObject(propertySetStorage);
        }
    }

    private const int STG_E_FILENOTFOUND = unchecked((int)0x80030002);
    private const int STG_E_PATHNOTFOUND = unchecked((int)0x80030003);
    private const int STG_E_ACCESSDENIED = unchecked((int)0x80030005);

    private enum PRSPEC
    {
        PRSPEC_LPWSTR = 0,
        PRSPEC_PROPID = 1
    }

    private enum STGFMT
    {
        STGFMT_ANY = 4,
    }

    [Flags]
    private enum STGM
    {
        STGM_READ = 0x00000000,
        STGM_READWRITE = 0x00000002,
        STGM_SHARE_DENY_NONE = 0x00000040,
        STGM_SHARE_DENY_WRITE = 0x00000020,
        STGM_SHARE_EXCLUSIVE = 0x00000010,
        STGM_DIRECT_SWMR = 0x00400000
    }

    // we only define what we handle
    private enum VARTYPE : short
    {
        VT_I2 = 2,
        VT_I4 = 3,
        VT_R4 = 4,
        VT_R8 = 5,
        VT_CY = 6,
        VT_DATE = 7,
        VT_BSTR = 8,
        VT_DISPATCH = 9,
        VT_ERROR = 10,
        VT_BOOL = 11,
        VT_UNKNOWN = 13,
        VT_DECIMAL = 14,
        VT_I1 = 16,
        VT_UI1 = 17,
        VT_UI2 = 18,
        VT_UI4 = 19,
        VT_I8 = 20,
        VT_UI8 = 21,
        VT_INT = 22,
        VT_UINT = 23,
        VT_HRESULT = 25,
        VT_LPSTR = 30,
        VT_LPWSTR = 31,
        VT_FILETIME = 64,
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct PROPVARIANTunion
    {
        [FieldOffset(0)]
        public sbyte cVal;
        [FieldOffset(0)]
        public byte bVal;
        [FieldOffset(0)]
        public short iVal;
        [FieldOffset(0)]
        public ushort uiVal;
        [FieldOffset(0)]
        public int lVal;
        [FieldOffset(0)]
        public uint ulVal;
        [FieldOffset(0)]
        public int intVal;
        [FieldOffset(0)]
        public uint uintVal;
        [FieldOffset(0)]
        public long hVal;
        [FieldOffset(0)]
        public ulong uhVal;
        [FieldOffset(0)]
        public float fltVal;
        [FieldOffset(0)]
        public double dblVal;
        [FieldOffset(0)]
        public short boolVal;
        [FieldOffset(0)]
        public int scode;
        [FieldOffset(0)]
        public long cyVal;
        [FieldOffset(0)]
        public double date;
        [FieldOffset(0)]
        public long filetime;
        [FieldOffset(0)]
        public IntPtr bstrVal;
        [FieldOffset(0)]
        public IntPtr pszVal;
        [FieldOffset(0)]
        public IntPtr pwszVal;
        [FieldOffset(0)]
        public IntPtr punkVal;
        [FieldOffset(0)]
        public IntPtr pdispVal;
    }

    [StructLayout(LayoutKind.Sequential)]
    private struct PROPSPEC
    {
        public PRSPEC ulKind;
        public PROPSPECunion union;
    }

    [StructLayout(LayoutKind.Explicit)]
    private struct PROPSPECunion
    {
        [FieldOffset(0)]
        public int propid;
        [FieldOffset(0)]
        public IntPtr lpwstr;
    }

    [StructLayout(LayoutKind.Sequential)]
    private struct PROPVARIANT
    {
        public VARTYPE vt;
        public ushort wReserved1;
        public ushort wReserved2;
        public ushort wReserved3;
        public PROPVARIANTunion union;
    }

    [StructLayout(LayoutKind.Sequential)]
    private struct STATPROPSTG
    {
        [MarshalAs(UnmanagedType.LPWStr)]
        public string lpwstrName;
        public int propid;
        public VARTYPE vt;
    }

    [StructLayout(LayoutKind.Sequential)]
    private struct STATPROPSETSTG
    {
        public Guid fmtid;
        public Guid clsid;
        public uint grfFlags;
        public System.Runtime.InteropServices.ComTypes.FILETIME mtime;
        public System.Runtime.InteropServices.ComTypes.FILETIME ctime;
        public System.Runtime.InteropServices.ComTypes.FILETIME atime;
        public uint dwOSVersion;
    }

    [DllImport("ole32.dll")]
    private static extern int StgOpenStorageEx([MarshalAs(UnmanagedType.LPWStr)] string pwcsName, STGM grfMode, STGFMT stgfmt, int grfAttrs, IntPtr pStgOptions, IntPtr reserved2, [MarshalAs(UnmanagedType.LPStruct)] Guid riid, out IPropertySetStorage ppObjectOpen);

    [DllImport("ole32.dll")]
    private static extern int PropVariantClear(ref PROPVARIANT pvar);

    [Guid("0000013B-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IEnumSTATPROPSETSTG
    {
        [PreserveSig]
        int Next(int celt, ref STATPROPSETSTG rgelt, out int pceltFetched);
        // rest ommited
    }

    [Guid("00000139-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IEnumSTATPROPSTG
    {
        [PreserveSig]
        int Next(int celt, ref STATPROPSTG rgelt, out int pceltFetched);
        // rest ommited
    }

    [Guid("00000138-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IPropertyStorage
    {
        [PreserveSig]
        int ReadMultiple(uint cpspec, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)] PROPSPEC[] rgpspec, [Out, MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)] PROPVARIANT[] rgpropvar);
        [PreserveSig]
        int WriteMultiple(uint cpspec, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)]  PROPSPEC[] rgpspec, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)]  PROPVARIANT[] rgpropvar, uint propidNameFirst);
        [PreserveSig]
        int DeleteMultiple(uint cpspec, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)] PROPSPEC[] rgpspec);
        [PreserveSig]
        int ReadPropertyNames(uint cpropid, [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 0)] int[] rgpropid, [Out, MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.LPWStr, SizeParamIndex = 0)] string[] rglpwstrName);
        [PreserveSig]
        int NotDeclared1();
        [PreserveSig]
        int NotDeclared2();
        [PreserveSig]
        int Commit(uint grfCommitFlags);
        [PreserveSig]
        int NotDeclared3();
        [PreserveSig]
        int Enum(out IEnumSTATPROPSTG ppenum);
        // rest ommited
    }

    [Guid("0000013A-0000-0000-C000-000000000046"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    private interface IPropertySetStorage
    {
        [PreserveSig]
        int Create([MarshalAs(UnmanagedType.LPStruct)] Guid rfmtid, [MarshalAs(UnmanagedType.LPStruct)] Guid pclsid, uint grfFlags, STGM grfMode, out IPropertyStorage ppprstg);
        [PreserveSig]
        int Open([MarshalAs(UnmanagedType.LPStruct)] Guid rfmtid, STGM grfMode, out IPropertyStorage ppprstg);
        [PreserveSig]
        int NotDeclared3();
        [PreserveSig]
        int Enum(out IEnumSTATPROPSETSTG ppenum);
    }
}

public sealed class StructuredProperty
{
    public StructuredProperty(Guid formatId, string name, int id)
    {
        FormatId = formatId;
        Name = name;
        Id = id;
    }

    public Guid FormatId { get; private set; }
    public string Name { get; private set; }
    public int Id { get; private set; }
    public object Value { get; set; }

    public override string ToString()
    {
        return Name;
    }
}
like image 56
Simon Mourier Avatar answered Nov 13 '22 10:11

Simon Mourier


You can try NPOI engine to extract properties from different Office files (doc, xls, xlsx, docx, etc). This component doesn't have any third party dependencies and you don't need Office to use it.

However, this library is a bit tricky because you need to use different types of property extractors for different type of files. Good code sample can be found in official Git Hub repository TestHPSFPropertiesExtractor.

NuGet package can be found here.

like image 2
Nikita Avatar answered Nov 13 '22 09:11

Nikita


There's a set of NuGet packages called NetOffice that you can use. There are packages for each of the Office applications, plus a few core assemblies. Get NetOffice.Word and NetOffice.Excel and install them in your solution. There's a little documentation at the Codeplex site, but I had to browse the source to really understand what was going on. Here's a sample program:

using NetOffice.OfficeApi;
using System;

namespace Office_Doc_Reader
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var wordApp = new NetOffice.WordApi.Application())
            using (var excelApp = new NetOffice.ExcelApi.Application())
            {
                var doc = wordApp.Documents.Open("C:\\Users\\John\\Desktop\\test.docx");
                var xls = excelApp.Workbooks.Open("C:\\Users\\John\\Desktop\\test.xlsx");

                var customProperties = (DocumentProperties)doc.CustomDocumentProperties;

                foreach (var property in customProperties)
                {
                    Console.WriteLine(String.Format("Name: {0}, Value: {1}, Type: {2}", property.Name, property.Value, property.Type));
                }

                customProperties = (DocumentProperties)xls.CustomDocumentProperties;

                foreach (var property in customProperties)
                {
                    Console.WriteLine(String.Format("Name: {0}, Value: {1}, Type: {2}", property.Name, property.Value, property.Type));
                }
            }

            Console.ReadKey();
        }
    }
}

That shows the following results:

Name: Custom prop 1, Value: Text Value, Type: msoPropertyTypeString
Name: Custom prop 2, Value: 2/21/2016 12:00:00 AM, Type: msoPropertyTypeDate
Name: Custom prop 3, Value: 42, Type: msoPropertyTypeNumber
Name: Custom prop 4, Value: True, Type: msoPropertyTypeBoolean
Name: Foo, Value: abc, Type: msoPropertyTypeString
Name: Bar, Value: 1/1/1970 12:00:00 AM, Type: msoPropertyTypeDate
Name: Baz, Value: 3.14159, Type: msoPropertyTypeFloat
Name: Qux, Value: False, Type: msoPropertyTypeBoolean

For a Word and Excel file with these properties:

Word Properties Excel Properties

I've not worked with these much at all, so I won't be able to go much deeper than this. Hope it helps.

like image 1
John Riehl Avatar answered Nov 13 '22 08:11

John Riehl