Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Screen-scraping a site with a asp.net form login in C#?

Would it be possible to write a screen-scraper for a website protected by a form login. I have access to the site, of course, but I have no idea how to login to the site and save my credentials in C#.

Also, any good examples of screenscrapers in C# would be hugely appreciated.

Has this already been done?

like image 771
bdd Avatar asked May 23 '09 07:05

bdd


People also ask

Can you scrape a website that requires login?

Yes, it's login screens. Sometimes, you might set your sights on scraping data you can access only after you log into an account. It could be your channel analytics, your user history, or any other type of information you need. In this case, first check if the company provides an API for the purpose.

Can you do web scraping with C?

As you saw in this tutorial, C++, which is normally used for system programming, also works well for web scraping because of its ability to parse HTTP.

How do I scrape data from a website after login?

Web Scraping Past Login ScreensParseHub is a free and powerful web scraper that can log in to any site before it starts scraping data. You can then set it up to extract the specific data you want and download it all to an Excel or JSON file. To get started, make sure you download and install ParseHub for free.


2 Answers

It's pretty simple. You need your custom login (HttpPost) method.

You can come up with something like this (in this way you will get all needed cookies after login, and you need just to pass them to the next HttpWebRequest):

public static HttpWebResponse HttpPost(String url, String referer, String userAgent, ref CookieCollection cookies, String postData, out WebHeaderCollection headers, WebProxy proxy)
    {
        try
        {
            HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
            http.Proxy = proxy;
            http.AllowAutoRedirect = true;
            http.Method = "POST";
            http.ContentType = "application/x-www-form-urlencoded";
            http.UserAgent = userAgent;
            http.CookieContainer = new CookieContainer();
            http.CookieContainer.Add(cookies);
            http.Referer = referer;
            byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
            http.ContentLength = dataBytes.Length;
            using (Stream postStream = http.GetRequestStream())
            {
                postStream.Write(dataBytes, 0, dataBytes.Length);
            }
            HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
            headers = http.Headers;
            cookies.Add(httpResponse.Cookies);

            return httpResponse;
        }
        catch { }
        headers = null;

        return null;
    }
like image 182
Lukas Šalkauskas Avatar answered Sep 29 '22 15:09

Lukas Šalkauskas


Sure, this has been done. I have done it a couple of times. This is (generically) called Screen-scraping or Web Scraping.

You should take a look at this question (and also browse the questions under the tag "screen-scraping". Note that Scraping does not only relate to data extraction from a web resource. It also involves submission of data to online forms so as mimic the actions of a user when submitting input such as a Login form.

like image 44
Cerebrus Avatar answered Sep 29 '22 15:09

Cerebrus