Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inherent way to save web page source

I have read a lot of answers regarding web scraping that talk about BeautifulSoup, Scrapy e.t.c. to perform web scraping.

Is there a way to do the equivalent of saving a page's source from a web brower?

That is, is there a way in Python to point it at a website and get it to save the page's source to a text file with just the standard Python modules?

Here is where I got to:

import urllib

f = open('webpage.txt', 'w')
html = urllib.urlopen("http://www.somewebpage.com")

#somehow save the web page source

f.close()

Not much I know - but looking for code to actually pull the source of the page so I can write it. I gather that urlopen just makes a connection.

Perhaps there is a readlines() equivalent for reading lines of a web page?

like image 829
Fusilli Jerry Avatar asked Nov 11 '12 14:11

Fusilli Jerry


People also ask

How do I save a website source code?

To download a website's HTML source code, navigate using your favorite browser to the page, and then select SAVE PAGE AS from the FILE menu. You'll then be prompted to select whether you want to download the whole page (including images) or just the source code. The download options are common for all browsers.

How do I save a website for future references?

Save a Web Page in ChromeOpen the three-dot menu on the top right and select More Tools > Save page as. You can also right-click anywhere on the page and select Save as or use the keyboard shortcut Ctrl + S in Windows or Command + S in macOS.


1 Answers

You may try urllib2:

import urllib2

page = urllib2.urlopen('http://stackoverflow.com')

page_content = page.read()

with open('page_content.html', 'w') as fid:
    fid.write(page_content)
like image 173
btel Avatar answered Oct 19 '22 05:10

btel