Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Error "The remote server returned an error: (403) Forbidden" when screen scraping using HttpWebRequest.GetResponse()

We have a tool which checks if a given URL is a live URL. If a given url is live another part of our software can screen scrap the content from it.

This is my code for checking if a url is live

    public static bool IsLiveUrl(string url)
    {
        HttpWebRequest webRequest = WebRequest.Create(url) as HttpWebRequest;
        webRequest.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5";
        webRequest.CookieContainer = new CookieContainer();
        WebResponse webResponse;
        try
        {
            webResponse = webRequest.GetResponse();
        }
        catch (WebException e)
        {
            return false;
        }
        catch (Exception ex)
        {

            return false;
        }
        return true;
    }

This code works perfectly but for a particular site hosted on apache i am getting a web exception with following message. "The remote server returned an error: (403) Forbidden" On further inspection i found the following details in the WebException object

Status="ProtocolError" StatusDescription="Bad Behaviour"

This is the request header "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.6) Gecko/20060728 Firefox/1.5 Host: scenicspares.co.uk Connection: Keep-Alive"

This is the response header "Keep-Alive: timeout=4, max=512 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html Date: Thu, 13 Jan 2011 10:29:36 GMT Server: Apache"

I extracted these headers using a watch in vs2008. The frame work in use is 3.5.

like image 932
Syed Salman Akbar Avatar asked Jan 13 '11 10:01

Syed Salman Akbar


1 Answers

It turned out that all i needed to do was following

            webRequest.Accept = "*/*";
            webResponse = webRequest.GetResponse();

and it was fixed.

like image 149
Syed Salman Akbar Avatar answered Sep 20 '22 00:09

Syed Salman Akbar