Inherent way to save web page source

Tags:

python

web-scraping

I have read a lot of answers regarding web scraping that talk about BeautifulSoup, Scrapy e.t.c. to perform web scraping.

Is there a way to do the equivalent of saving a page's source from a web brower?

That is, is there a way in Python to point it at a website and get it to save the page's source to a text file with just the standard Python modules?

Here is where I got to:

import urllib

f = open('webpage.txt', 'w')
html = urllib.urlopen("http://www.somewebpage.com")

#somehow save the web page source

f.close()

Not much I know - but looking for code to actually pull the source of the page so I can write it. I gather that urlopen just makes a connection.

Perhaps there is a readlines() equivalent for reading lines of a web page?

829

asked Nov 11 '12 14:11

Fusilli Jerry

1 Answers

You may try urllib2:

import urllib2

page = urllib2.urlopen('http://stackoverflow.com')

page_content = page.read()

with open('page_content.html', 'w') as fid:
    fid.write(page_content)

173

answered Oct 19 '22 05:10

btel

Related questions
                            
                                Pros and cons of IronPython and IronPython Studio
                            
                                Best way to denormalize data in Django? [closed]
                            
                                PyObjC development with Xcode 3.2
                            
                                Assignment to None
                            
                                Django: Change models without clearing all data?
                            
                                Not all of arguments converted during string formatting
                            
                                Modifying a Python dictionary from different threads
                            
                                Passing dict to constructor?
                            
                                Why did I need to specify a specific class to import in python?
                            
                                Boost::Python- possible to automatically convert from dict --> std::map?
                            
                                Python logging with context
                            
                                Updating context data in FormView form_valid method?
                            
                                Automatically Type Cast Parameters In Python
                            
                                Make matplotlib autoscaling ignore some of the plots
                            
                                Add a directory to Python sys.path so that it's included each time I use Python
                            
                                Can't import Webkit from gi.repository
                            
                                How to keep a socket open until client closes it?
                            
                                Limiting Python input strings to certain characters and lengths
                            
                                python sqlalchemy get column names dynamically?
                            
                                Python subprocess module much slower than commands (deprecated)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With