Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster way to get multiple FileInfo's?

Tags:

c#

file-io

winapi

This is a longshot, but is there a faster way to get size, lastaccessedtime, lastcreated time etc for multiple files?

I have a long list of file paths (so I needn't enumerate) and need to look up that information as quickly as possible. Creating FileInfo's in parallel probably won't help much since the bottleneck should be the disk.

The NTFS Journal only keeps the filenames unfortunately otherwise that' be great, i guess the OS doesn't store that meta information somewhere?

One other optimization that might be done if there's a static or Win32 call (File methods only allows me to get one piece of information at a time though) method that fetches the information rather that creating a bunch of FileInfo objects

Anyways, glad if anyone know something that might help, unfortunately I do have to have to do micro optimization here and no "using a database" isn't a viable answer ;)

like image 753
Homde Avatar asked Dec 04 '10 09:12

Homde


3 Answers

There are static methods on System.IO.File to get what you want. It's a micro-optimization, but it might be what you need: GetLastAccessTime, GetCreationTime.

Edit

I'll leave the text above because you specifically asked for static methods. However, I think you are better off using FileInfo (you should measure just to be sure). Both File and FileInfo uses an internal method on File called FillAttributeInfo to get the data you are after. For the properties you need, FileInfo will need to call this method once. File will have to call it on each call, since the attribute info object is thrown away when the method finishes (since it's static).

So my hunch is, when you need multiple attributes, a FileInfo for each file will be faster. But in performance situations, you should always measure ! Faced with this problem, I would try both managed options as outlined above and make a benchmark, both when running in serial and in parallel. Then decide if it's fast enough.

If it is not fast enough, you need to resort into calling the Win32 API directly. It wouldn't be too hard to look at File.FileAttributeInfo in the reference sources and come up with something similar.

2nd Edit

In fact, if you really need it, this is the code required to call the Win32 API directly using the same approach as the internal code for File does, but using one OS call to get all the attributes. I think you should only use if it is really neccessary. You'll have to parse from FILETIME to a usable datetime yourself, etc, so you get some more work to do manually.

static class FastFile
{
    private const int MAX_PATH = 260;
    private const int MAX_ALTERNATE = 14;

    public static WIN32_FIND_DATA GetFileData(string fileName)
    {
        WIN32_FIND_DATA data;
        IntPtr handle = FindFirstFile(fileName, out data);
        if (handle == IntPtr.Zero)
            throw new IOException("FindFirstFile failed");
        FindClose(handle);
        return data;
    }

    [DllImport("kernel32")]
    private static extern IntPtr FindFirstFile(string fileName, out WIN32_FIND_DATA data);

    [DllImport("kernel32")]
    private static extern bool FindClose(IntPtr hFindFile);


    [StructLayout(LayoutKind.Sequential)]
    public struct FILETIME
    {
        public uint dwLowDateTime;
        public uint dwHighDateTime;
    }
    [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
    public struct WIN32_FIND_DATA
    {
        public FileAttributes dwFileAttributes;
        public FILETIME ftCreationTime;
        public FILETIME ftLastAccessTime;
        public FILETIME ftLastWriteTime;
        public int nFileSizeHigh;
        public int nFileSizeLow;
        public int dwReserved0;
        public int dwReserved1;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_PATH)]
        public string cFileName;
        [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_ALTERNATE)]
        public string cAlternate;
    }
}
like image 120
driis Avatar answered Nov 17 '22 03:11

driis


.NET's DirectoryInfo and FileInfo classes are incredibly slow in this matter, especially when used with network shares.

If many of the files to be "scanned" are in the same directory, you'll get much faster results (depending on the situation: by dimensions faster) by using the Win32 API's FindFirstFile, FindNextFile and FindClose functions. This is even true if you have to ask for more information that you actually need (e.g. if you ask for all ".log" files in a directory, where you only need 75% of them).

Actually, .NET's info classes also use these Win32 API functions internally. But they only "remmeber" the file names. When asking for more information on a bunch of files (e.g. LastModified), a separate (network) request is made for each file, which taskes time.

like image 24
Stefan Schultze Avatar answered Nov 17 '22 04:11

Stefan Schultze


Is it possible to use DirectoryInfo class?

 DirectoryInfo d = new DirectoryInfo(@"c:\\Temp");
 FileInfo[] f= d.GetFiles()
like image 3
TalentTuner Avatar answered Nov 17 '22 05:11

TalentTuner