Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to get html page source by C#

Tags:

c#

I want to save complete web page asp in local drive by .htm from url or url but I did not success.

Code

public StreamReader Fn_DownloadWebPageComplete(string link_Pagesource)
{
     //--------- Download Complete ------------------
     //  using (WebClient client = new WebClient()) // WebClient class inherits IDisposable
     //   {

     //client
     //HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(link_Pagesource);

                    //webRequest.AllowAutoRedirect = true;
                    //var client1 = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(link_Pagesource);
                    //client1.CookieContainer = new System.Net.CookieContainer();


                 //   client.DownloadFile(link_Pagesource, @"D:\S1.htm");

              //  }
         //--------- Download Page Source ------------------
 HttpWebRequest URL_pageSource = (HttpWebRequest)WebRequest.Create("https://www.digikala.com");

                    URL_pageSource.Timeout = 360000;
                    //URL_pageSource.Timeout = 1000000;
                    URL_pageSource.ReadWriteTimeout = 360000;
                   // URL_pageSource.ReadWriteTimeout = 1000000;
                    URL_pageSource.AllowAutoRedirect = true;
                    URL_pageSource.MaximumAutomaticRedirections = 300;

                    using (WebResponse MyResponse_PageSource = URL_pageSource.GetResponse())
                    {

                        str_PageSource = new StreamReader(MyResponse_PageSource.GetResponseStream(), System.Text.Encoding.UTF8);
                        pagesource1 = str_PageSource.ReadToEnd();
                        success = true;
                    }

Error :

Too many automatic redirections were attempted.

Attemp by this codes but not successful.

many url is successful with this codes but this url not successful.

like image 742
RedArmy Avatar asked Jan 21 '17 10:01

RedArmy


1 Answers

here is the way

string url = "https://www.digikala.com/";

using (HttpClient client = new HttpClient())
{
   using (HttpResponseMessage response = client.GetAsync(url).Result)
   {
      using (HttpContent content = response.Content)
      {
         string result = content.ReadAsStringAsync().Result;
      }
   }
}

and result variable will contains the page as HTML then you can save it to a file like this

System.IO.File.WriteAllText("path/filename.html", result);

NOTE you have to use the namespace

using System.Net.Http;

Update if you are using legacy VS then you can see this answer for using WebClient and WebRequest for the same purpose, but Actually updating your VS is a better solution.

like image 172
Hakan Fıstık Avatar answered Oct 30 '22 06:10

Hakan Fıstık