I need save the HTML code of any website in a txt file, is a very easy exercise but I have doubts with this because a have a function that do this: <pre class="prettyprint"><code>import urllib.request def get_html(url): f=open('htmlcode.txt','w') page=urllib.request.urlopen(url) pagetext=page.read() ## Save the html and later save in the file f.write(pagetext) f.close() </code></pre> But this doesn't work.

I use <code>Python 3</code>. <code>pip install requests</code> - after install <code>requests</code> library you can save a webpage in txt file. <pre class="prettyprint lang-py prettyprint-override"><code>import requests url = "https://stackoverflow.com/questions/24297257/save-html-of-some-website-in-a-txt-file-with-python" r = requests.get(url) with open('file.txt', 'w') as file: file.write(r.text) </code></pre>

Easiest way would be to use urlretrieve: <pre class="prettyprint"><code>import urllib urllib.urlretrieve("http://www.example.com/test.html", "test.txt") </code></pre> For Python 3.x the code is as follows: <pre class="prettyprint"><code>import urllib.request urllib.request.urlretrieve("http://www.example.com/test.html", "test.txt") </code></pre>

Save HTML of some website in a txt file with python

Tags:

python

html

python-3.x

parsing

urllib

I need save the HTML code of any website in a txt file, is a very easy exercise but I have doubts with this because a have a function that do this:

import urllib.request

def get_html(url):
    f=open('htmlcode.txt','w')
    page=urllib.request.urlopen(url)
    pagetext=page.read() ## Save the html and later save in the file
    f.write(pagetext)
    f.close()

But this doesn't work.

308

asked Jun 19 '14 01:06

thecatbehindthemask

2 Answers

I use Python 3.
pip install requests - after install requests library you can save a webpage in txt file.

import requests

url = "https://stackoverflow.com/questions/24297257/save-html-of-some-website-in-a-txt-file-with-python"

r = requests.get(url)
with open('file.txt', 'w') as file:
    file.write(r.text)

118

answered Oct 19 '22 19:10

Serhii

Easiest way would be to use urlretrieve:

import urllib

urllib.urlretrieve("http://www.example.com/test.html", "test.txt")

For Python 3.x the code is as follows:

import urllib.request    
urllib.request.urlretrieve("http://www.example.com/test.html", "test.txt")

answered Oct 19 '22 21:10

elyase

Related questions
                            
                                Extra line in output when printing inside a loop
                            
                                making two strings into one
                            
                                Range values to pseudocolor
                            
                                easy_install cx_Oracle (python package) on Windows
                            
                                python creates everything from heap?
                            
                                SMTP sending an priority email
                            
                                flask sqlalchemy column constraint for positive integer
                            
                                python - list comprehension without assignment
                            
                                How to return a matplotlib.figure.Figure object from Pandas plot function
                            
                                JS dataTables from pandas
                            
                                Django - Custom Admin Actions Logging
                            
                                Recursively dump an object
                            
                                How to decompress a .xz file which has multiple folders/files inside, in a single go?
                            
                                Converting Boolean value from Javascript to Django?
                            
                                Print the Python Exception/Error Hierarchy
                            
                                Basic example for PCA with matplotlib
                            
                                Name is not defined in Django model
                            
                                Missing sqlite3 after Python3 compile
                            
                                Python datetime add
                            
                                Generating Silence with pyDub

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With