Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FileStream.Seek vs. Buffered Reading

Tags:

c#

file-io

Motivated by this answer I was wondering what's going on under the curtain if one uses lots of FileStream.Seek(-1).

For clarity I'll repost the answer:

using (var fs = File.OpenRead(filePath))
{
    fs.Seek(0, SeekOrigin.End);

    int newLines = 0;
    while (newLines < 3)
    {
        fs.Seek(-1, SeekOrigin.Current);
        newLines += fs.ReadByte() == 13 ? 1 : 0; // look for \r
        fs.Seek(-1, SeekOrigin.Current);
    }

    byte[] data = new byte[fs.Length - fs.Position];
    fs.Read(data, 0, data.Length);
}

Personally I would have read like 2048 bytes into a buffer and searched that buffer for the char.

Using Reflector I found out that internally the method is using SetFilePointer.

Is there any documentation about windows caching and reading a file backwards? Does Windows buffer "backwards" and consult the buffer when using consecutive Seek(-1) or will it read ahead starting from the current position?

It's interesting that on the one hand most people agree with Windows doing good caching, but on the other hand every answer to "reading file backwards" involves reading chunks of bytes and operating on that chunk.

like image 599
VVS Avatar asked Dec 06 '10 17:12

VVS


2 Answers

Going forward vs backward doesn't usually make much difference. The file data is read into the file system cache after the first read, you get a memory-to-memory copy on ReadByte(). That copy isn't sensitive to the file pointer value as long as the data is in the cache. The caching algorithm does however work from the assumption that you'd normally read sequentially. It tries to read ahead, as long as the file sectors are still on the same track. They usually are, unless the disk is heavily fragmented.

But yes, it is inefficient. You'll get hit with two pinvoke and API calls for each individual byte. There's a fair amount of overhead in that, those same two calls could also read, say, 65 kilobytes with the same amount of overhead. As usual, fix this only when you find it to be a perf bottleneck.

like image 111
Hans Passant Avatar answered Nov 09 '22 17:11

Hans Passant


Here is a pointer on File Caching in Windows

The behavior may also depends on where physically resides the file (hard disk, network, etc.) as well as local configuration/optimization.

An also important source of information is the CreateFile API documentation: CreateFile Function

There is a good section named "Caching Behavior" that tells us at least how you can influence file caching, at least in the unmanaged world.

like image 40
Simon Mourier Avatar answered Nov 09 '22 18:11

Simon Mourier