Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Strange behavior with FileStream.WriteFile

I'm working on a program that does heavy read/write random access on huge file (till 64 GB). Files are specifically structured and to make access on them I've created a framework; after a while I tried to test performance on it and I've noticed that on preallocated file sequential write operations are too slow to be acceptable. After many tests I replicated the behavior without my framework (only FileStream methods); here's the portion of code that (with my hardware) replicates the issue:

FileStream fs = new FileStream("test1.vhd", FileMode.Open);
byte[] buffer = new byte[256 * 1024];
Random rand = new Random();
rand.NextBytes(buffer);
DateTime start, end;
double ellapsed = 0.0;
long startPos, endPos;

BinaryReader br = new BinaryReader(fs);
br.ReadUInt32();
br.ReadUInt32();
for (int i = 0; i < 65536; i++)
    br.ReadUInt16();

br = null;

startPos = 0;   // 0
endPos = 4294967296;    // 4GB
for (long index = startPos; index < endPos; index += buffer.Length)
{
    start = DateTime.Now;
    fs.Write(buffer, 0, buffer.Length);
    end = DateTime.Now;
    ellapsed += (end - start).TotalMilliseconds;
}

Unfortunately the issue seems to be unpredictable, so sometimes it "works", sometimes it doesn't. However, using Process Monitor I've caught the following events:

Operation   Result  Detail
WriteFile   SUCCESS Offset: 1.905.655.816, Length: 262.144
WriteFile   SUCCESS Offset: 1.905.917.960, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.180.104, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.442.248, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.704.392, Length: 262.144
WriteFile   SUCCESS Offset: 1.906.966.536, Length: 262.144
ReadFile    SUCCESS Offset: 1.907.228.672, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile   SUCCESS Offset: 1.907.228.680, Length: 262.144
ReadFile    SUCCESS Offset: 1.907.355.648, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile    SUCCESS Offset: 1.907.490.816, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile   SUCCESS Offset: 1.907.490.824, Length: 262.144
ReadFile    SUCCESS Offset: 1.907.617.792, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
ReadFile    SUCCESS Offset: 1.907.752.960, Length: 32.768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
WriteFile   SUCCESS Offset: 1.907.752.968, Length: 262.144

That is, after over-writing almost 2 GB, FileStream.Write starts to call ReadFile after every WriteFile, and this issue continue till the end of the process; also, the offset at which the issue begins seems to be random. I've debugged step-by-step inside the FileStream.Write method and I've verified that actually is the WriteFile (Win32 API) that, internally, calls ReadFile.

Last note; I don't think it is a file fragmentation issue: I've defragmented the file personally with contig!

like image 680
Atropo Avatar asked Feb 02 '11 13:02

Atropo


2 Answers

I believe this has to do with FileStream.Write / Read and a 2GB limit. Are you running this in a 32 bit process? I could not find any specific documentation on this, but here is a MSDN forum question that sounds the same. You could try running this in a 64bit process.

I agree however that using a memory mapped file may be a better approach.

like image 56
Mike Ohlsen Avatar answered Oct 21 '22 04:10

Mike Ohlsen


I found this from MSDN. Could it be related? Sounds to me each file has one globally shared pointer.

When a FileStream object does not have an exclusive hold on its handle, another thread could access the file handle concurrently and change the position of the operating system's file pointer that is associated with the file handle. In this case, the cached position in the FileStream object and the cached data in the buffer could be compromised. The FileStream object routinely performs checks on methods that access the cached buffer to assure that the operating system's handle position is the same as the cached position used by the FileStream object.

http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx

like image 1
Bengie Avatar answered Oct 21 '22 03:10

Bengie