I am developing an app where I need to download a bunch of web pages, preferably as fast as possible. The way that I do that right now is that I have multiple threads (100's) that have their own System.Net.HttpWebRequest
. This sort of works, but I am not getting the performance I would like. Currently I have a beefy 600+ Mb/s connection to work with, and this is only utilized at most 10% (at peaks). I guess my strategy is flawed, but I am unable to find any other good way of doing this.
Also: If the use of HttpWebRequest
is not a good way to download web pages, please say so :)
The code has been semi-auto-converted from java.
Thanks :)
Update:
public String getPage(String link){
myURL = new System.Uri(link);
myHttpConn = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(myURL);
myStreamReader = new System.IO.StreamReader(new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).BaseStream,
new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).CurrentEncoding);
System.Text.StringBuilder buffer = new System.Text.StringBuilder();
//myLineBuff is a String
while ((myLineBuff = myStreamReader.ReadLine()) != null)
{
buffer.Append(myLineBuff);
}
return buffer.toString();
}
One problem is that it appears you're issuing each request twice:
myStreamReader = new System.IO.StreamReader(
new System.IO.StreamReader(
myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).BaseStream,
new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
System.Text.Encoding.Default).CurrentEncoding);
It makes two calls to GetResponse
. For reasons I fail to understand, you're also creating two stream readers. You can split that up and simplify it, and also do a better job of error handling...
var response = (HttpWebResponse)myHttpCon.GetResponse();
myStreamReader = new StreamReader(response.GetResponseStream(), Encoding.Default)
That should double your effective throughput.
Also, you probably want to make sure to dispose of the objects you're using. When you're downloading a lot of pages, you can quickly run out of resources if you don't clean up after yourself. In this case, you should call response.Close()
. See http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.close.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With