Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# code for saving an entire web page? (with images/formatting)

Tags:

c#

http

I've been struggling to find an exmample of some C# code (I'm using C# Visual Studio 2008 Express) that can programmatically save an entire web page (given a URL) including the images and formatting (e.g. CSS). The intention is that in a subsequent phase I'd ship this off (not sure how yet) so it could be viewed later via a browser.

Is there an example of the most simple approach (leveraging the .NET Framework methods) to save an entire web page? Saving as one page with a subdirectory for images, or otherwise. Basically the same as what you get with browsers when you say "save entire web page".

like image 756
Greg Avatar asked Sep 16 '09 04:09

Greg


People also ask

What C is used for?

C programming language is a machine-independent programming language that is mainly used to create many types of applications and operating systems such as Windows, and other complicated programs such as the Oracle database, Git, Python interpreter, and games and is considered a programming foundation in the process of ...

What is the full name of C?

In the real sense it has no meaning or full form. It was developed by Dennis Ritchie and Ken Thompson at AT&T bell Lab. First, they used to call it as B language then later they made some improvement into it and renamed it as C and its superscript as C++ which was invented by Dr. Stroustroupe.

Is C language easy?

C is a general-purpose language that most programmers learn before moving on to more complex languages. From Unix and Windows to Tic Tac Toe and Photoshop, several of the most commonly used applications today have been built on C. It is easy to learn because: A simple syntax with only 32 keywords.

Why do we write C?

We write C for Carbon Because in some element the symbol of the element is taken form its first words and Co for Cobalt beacause in some elements the symbol of the element is taken from its first second letters, so that the we don't get confuse.


1 Answers

The simplest way is probably to add a WebBrowser Control to your application and point it at the page you want to save using the Navigate() method.

Then, when the document has loaded, call the ShowSaveAsDialog method. The user can then save the page as a single file, or a file with images in a subdirectory.

[Update]

Having now noticed "programatically" in your question, the above approach is not ideal as it requires either user involvement or delving into the Windows API to send input using SendKeys or similar.

There is nothing built-in to the .NET Framework that does all of what you ask.

So my approach revised would be:

  • Use System.NET.HttpWebRequest to get the main HTML document as a string or stream (easy).
  • Load this into a HTMLAgilityPack document where you can now easily query the document to get lists of all image elements, stylesheet links, etc.
  • Then make a separate web request for each of these files and save them to a subdirectory.
  • Finally update all relevent links in the main page to point to the items in the subdirectory.

In effect you would be implementing a very simple web browser. You may run into issues with pages that use JavaScript to dynamically alter or request page content, but for most pages this should give acceptable results.

like image 78
Ash Avatar answered Sep 26 '22 14:09

Ash