I am developing an app where I need to download a bunch of web pages, preferably as fast as possible. The way that I do that right now is that I have multiple threads (100's) that have their own System.Net.HttpWebRequest. This sort of works, but I am not getting the performance I would like. Currently I have a beefy 600+ Mb/s connection to work with, and this is only utilized at most 10% (at peaks). I guess my strategy is flawed, but I am unable to find any other good way of doing this.
Also: If the use of HttpWebRequest is not a good way to download web pages, please say so :)
The code has been semi-auto-converted from java.
Thanks :)
Update:
public String getPage(String link){
   myURL = new System.Uri(link);
   myHttpConn = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(myURL);
   myStreamReader = new System.IO.StreamReader(new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
            System.Text.Encoding.Default).BaseStream,
                new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
                    System.Text.Encoding.Default).CurrentEncoding);
        System.Text.StringBuilder buffer = new System.Text.StringBuilder();
        //myLineBuff is a String
        while ((myLineBuff = myStreamReader.ReadLine()) != null)
        {
            buffer.Append(myLineBuff);
        }
   return buffer.toString();
}
                One problem is that it appears you're issuing each request twice:
myStreamReader = new System.IO.StreamReader(
    new System.IO.StreamReader(
        myHttpConn.GetResponse().GetResponseStream(),
        System.Text.Encoding.Default).BaseStream,
             new System.IO.StreamReader(myHttpConn.GetResponse().GetResponseStream(),
                 System.Text.Encoding.Default).CurrentEncoding);
It makes two calls to GetResponse. For reasons I fail to understand, you're also creating two stream readers. You can split that up and simplify it, and also do a better job of error handling...
var response = (HttpWebResponse)myHttpCon.GetResponse();
myStreamReader = new StreamReader(response.GetResponseStream(), Encoding.Default)
That should double your effective throughput.
Also, you probably want to make sure to dispose of the objects you're using. When you're downloading a lot of pages, you can quickly run out of resources if you don't clean up after yourself. In this case, you should call response.Close().  See http://msdn.microsoft.com/en-us/library/system.net.httpwebresponse.close.aspx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With