Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open webpage programmatically and retrieve its html contain as a string

Tags:

html

c#

I have a facebook account and I would like to extract my friend's photo and its personal detail such as "Date of birth", "Studied at" and so on. I am able to extract the address of the facebook's first page for each of my friends account but I don't know how to programmatically open webpage for each of my friends first page and save the html contain as a string so that I can extract out their personal detail and photos. Please help! Thank in advance!

like image 560
user377338 Avatar asked Jan 19 '11 15:01

user377338


2 Answers

You have Three options:

1- Using a WebClient object.

WebClient webClient = new webClient();
webClient.Credentials = new System.Net.NetworkCredential("UserName","Password", "Domain");
string pageHTML = WebClient .DownloadString("http://url");`

2- Using a WebRequest. This is the best solution because it gives you more control over your request.

WebRequest myWebRequest = WebRequest.Create("http://URL");  
WebResponse myWebResponse = myWebRequest.GetResponse();  
Stream ReceiveStream = myWebResponse.GetResponseStream();                 
Encoding encode = System.Text.Encoding.GetEncoding("utf-8"); 
StreamReader readStream = new StreamReader( ReceiveStream, encode ); 
string strResponse=readStream.ReadToEnd();                 
StreamWriter oSw=new StreamWriter(strFilePath);     
oSw.WriteLine(strResponse); 
oSw.Close(); 
readStream.Close();        
myWebResponse.Close(); 

3- Using a WebBrowser (I bet you don't wanna do that)

WebBrowser wb = new WebBrowser();
wb.Navigate("http://URL");
string pageHTML = "";
wb.DocumentCompleted += (sender, e) => pageHTML = wb.DocumentText;

Excuse me if I misstyped any code because I improvised it and I don't have a syntax checker to check its correctness. But I think it should be fine.


EDIT: For facebook pages. You may consider using facebook Graph API:

http://developers.facebook.com/docs/reference/api/

like image 157
deadlock Avatar answered Nov 14 '22 23:11

deadlock


Try this:

var html = new WebClient()
               .DownloadString("the facebook account url goes here");

Also, once you have downloaded the HTML as a string I would highly recommend that you use the Html Agility Pack to parse it.

like image 33
Andrew Hare Avatar answered Nov 14 '22 23:11

Andrew Hare