How do I save a web page, programmatically?

Question

I would like to save a web page programmatically.

I don't mean merely save the HTML. I would also like automatically to store all associated files (images, CSS files, maybe embedded SWF, etc), and hopefully rewrite the links for local browsing.

The intended usage is a personal bookmarks application, in which link content is cached in case the original copy is taken down.

Josh · Accepted Answer

Take a look at wget, specifically the -p flag

−p  −−page−requisites
This option causes Wget to download all the ﬁles
that are necessary to properly display
a givenHTML  page. Thisincludes such
things as inlined images, sounds, and
referenced stylesheets.

The following command:

wget -p http://<site>/1.html

Will download page.html and all files it requires.

bmargulies · Answer

On Windows: you can run IE as a com object and pull everything out.

On other thing, you can take the source of Mozilla.

In Java, Lobo.

Or commons-httpclient and write a lot of code.

How do I save a web page, programmatically?

Tags:

caching

web-applications

html-content-extraction

screen-scraping

Joseph Turian

2 Answers

Josh

bmargulies

Recent Activity

Donate For Us

How do I save a web page, programmatically?

Tags:

caching

web-applications

html-content-extraction

screen-scraping

Joseph Turian

2 Answers

Josh

bmargulies

Related questions

Recent Activity

Donate For Us