I am trying to download a large file (>1GB) from one server to another using HTTP. To do this I am making HTTP range requests in parallel. This lets me download the file in parallel.
When saving to disk I am taking each response stream, opening the same file as a file stream, seeking to the range I want and then writing.
However I find that all but one of my response streams times out. It looks like the disk I/O cannot keep up with the network I/O. However, if I do the same thing but have each thread write to a separate file it works fine.
For reference, here is my code writing to the same file:
int numberOfStreams = 4;
List<Tuple<int, int>> ranges = new List<Tuple<int, int>>();
string fileName = @"C:\MyCoolFile.txt";
//List populated here
Parallel.For(0, numberOfStreams, (index, state) =>
{
try
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL");
using(Stream responseStream = webRequest.GetResponse().GetResponseStream())
{
using (FileStream fileStream = File.Open(fileName, FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write))
{
fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin);
byte[] buffer = new byte[64 * 1024];
int bytesRead;
while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
if (state.IsStopped)
{
return;
}
fileStream.Write(buffer, 0, bytesRead);
}
}
};
}
catch (Exception e)
{
exception = e;
state.Stop();
}
});
And here is the code writing to multiple files:
int numberOfStreams = 4;
List<Tuple<int, int>> ranges = new List<Tuple<int, int>>();
string fileName = @"C:\MyCoolFile.txt";
//List populated here
Parallel.For(0, numberOfStreams, (index, state) =>
{
try
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("Some URL");
using(Stream responseStream = webRequest.GetResponse().GetResponseStream())
{
using (FileStream fileStream = File.Open(fileName + "." + index + ".tmp", FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write))
{
fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin);
byte[] buffer = new byte[64 * 1024];
int bytesRead;
while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
if (state.IsStopped)
{
return;
}
fileStream.Write(buffer, 0, bytesRead);
}
}
};
}
catch (Exception e)
{
exception = e;
state.Stop();
}
});
My question is this, is there some additional checks/actions that C#/Windows takes when writing to a single file from multiple threads that would cause the file I/O to be slower than when writing to multiple files? All disk operations should be bound by the disk speed right? Can anyone explain this behavior?
Thanks in advance!
UPDATE: Here is the error the source server is throwing:
"Unable to write data to the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." [System.IO.IOException]: "Unable to write data to the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." InnerException: "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond" Message: "Unable to write data to the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond." StackTrace: " at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)\r\n at System.Net.Security._SslStream.StartWriting(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)\r\n at System.Net.Security._SslStream.ProcessWrite(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)\r\n at System.Net.Security.SslStream.Write(Byte[] buffer, Int32 offset, Int32 count)\r\n
Unless you're writing to a striped RAID, you're unlikely to experience performance benefits by writing to the file from multiple threads concurrently. In fact, it's more likely to be the opposite – the concurrent writes would get interleaved and cause random access, incurring disk seek latencies that makes them orders of magnitude slower than large sequential writes.
To get a sense of perspective, look at some latency comparisons. A sequential 1 MB read from disk takes 20 ms; writes take approximately the same time. Each disk seek, on the other hands, takes around 10 ms. If your writes are interleaved at 4 KB chunks, then your 1 MB write will require an additional 2560 ms of seek time, making it 100 times slower than sequential.
I would suggest only allowing one thread to write to the file at any time, and use parallelism just for the network transfer. You can use a producer–consumer pattern where downloaded chunks are written to a bounded concurrent collection (such as BlockingCollection<T>
), which then get picked up and written to disk by a dedicated thread.
fileStream.Seek(ranges[index].Item1, SeekOrigin.Begin);
That Seek() call is a problem, you'll seek to a part of the file that's very far removed from the current end-of-file. Your next fileStream.Write() call forces the file system to extend the file on disk, filling the unwritten parts of it with zeros.
This can take a while, your thread will be blocked until the file system is done extending the file. Might well be long enough to trigger a timeout. You'd see this go wrong early at the start of the transfer.
A workaround is to create and fill the entire file before you start writing real data. Otherwise a very common strategy used by downloaders, you might have seen .part files before. Another nice benefit is that you have a decent guarantee that the transfer cannot fail because the disk ran out of space. Beware that filling a file with zeros is only cheap when the machine has enough RAM. 1 GB should not be a problem on modern machines.
Repro code:
using System;
using System.IO;
using System.Diagnostics;
class Program {
static void Main(string[] args) {
string path = @"c:\temp\test.bin";
var fs = new FileStream(path, FileMode.Create, FileAccess.Write, FileShare.Write);
fs.Seek(1024L * 1024 * 1024, SeekOrigin.Begin);
var buf = new byte[4096];
var sw = Stopwatch.StartNew();
fs.Write(buf, 0, buf.Length);
sw.Stop();
Console.WriteLine("Writing 4096 bytes took {0} milliseconds", sw.ElapsedMilliseconds);
Console.ReadKey();
fs.Close();
File.Delete(path);
}
}
Output:
Writing 4096 bytes took 1491 milliseconds
That was on an fast SSD, a spindle drive is going to take much longer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With