Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpWebRequest Timeouts After Ten Consecutive Requests

I'm writing a web crawler for a specific site. The application is a VB.Net Windows Forms application that is not using multiple threads - each web request is consecutive. However, after ten successful page retrievals every successive request times out.

I have reviewed the similar questions already posted here on SO, and have implemented the recommended techniques into my GetPage routine, shown below:

Public Function GetPage(ByVal url As String) As String
    Dim result As String = String.Empty

    Dim uri As New Uri(url)
    Dim sp As ServicePoint = ServicePointManager.FindServicePoint(uri)
    sp.ConnectionLimit = 100

    Dim request As HttpWebRequest = WebRequest.Create(uri)
    request.KeepAlive = False
    request.Timeout = 15000

    Try
        Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse)
            Using dataStream As Stream = response.GetResponseStream()
                Using reader As New StreamReader(dataStream)
                    If response.StatusCode <> HttpStatusCode.OK Then
                        Throw New Exception("Got response status code: " + response.StatusCode)
                    End If
                    result = reader.ReadToEnd()
                End Using
            End Using
            response.Close()
        End Using

    Catch ex As Exception
        Dim msg As String = "Error reading page """ & url & """. " & ex.Message
        Logger.LogMessage(msg, LogOutputLevel.Diagnostics)
    End Try

    Return result

End Function

Have I missed something? Am I not closing or disposing of an object that should be? It seems strange that it always happens after ten consecutive requests.

Notes:

  1. In the constructor for the class in which this method resides I have the following:

    ServicePointManager.DefaultConnectionLimit = 100

  2. If I set KeepAlive to true, the timeouts begin after five requests.

  3. All the requests are for pages in the same domain.

EDIT

I added a delay between each web request of between two and seven seconds so that I do not appear to be "hammering" the site or attempting a DOS attack. However, the problem still occurs.

like image 777
Bob Mc Avatar asked Jul 28 '09 04:07

Bob Mc


3 Answers

I ran into this issue today and my resolution was to ensure that the response was closed at all times.

I think that you need to put in a response.Close() before you throw your exception inside the using.

Using response As HttpWebResponse = DirectCast(request.GetResponse, HttpWebResponse) 
        Using dataStream As Stream = response.GetResponseStream() 
            Using reader As New StreamReader(dataStream) 
                If response.StatusCode <> HttpStatusCode.OK Then 
                    response.Close()  
                    Throw New Exception("Got response status code: " + response.StatusCode) 
                End If 
                result = reader.ReadToEnd() 
            End Using 
        End Using 
        response.Close() 
    End Using
like image 68
Geoff Avatar answered Oct 05 '22 22:10

Geoff


I think the site has some sort of DOS protection, which kicks in when it's hit with a number of rapis requests. You may want to try setting the UserAgent on the webrequest.

like image 30
Paul van Brenk Avatar answered Oct 05 '22 22:10

Paul van Brenk


I used the following solution and it works for me. Hope it helps to you too.

Declare "global" on the form the variables.

HttpWebRequest myHttpWebRequest;
HttpWebResponse myHttpWebResponse;

Then always use myHttpWebResponse.Close(); after each connection.

myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
myHttpWebResponse.Close();
like image 26
fernando roque Avatar answered Oct 05 '22 23:10

fernando roque