Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Save HTML of some website in a txt file with python

I need save the HTML code of any website in a txt file, is a very easy exercise but I have doubts with this because a have a function that do this:

import urllib.request

def get_html(url):
    f=open('htmlcode.txt','w')
    page=urllib.request.urlopen(url)
    pagetext=page.read() ## Save the html and later save in the file
    f.write(pagetext)
    f.close()

But this doesn't work.

like image 308
thecatbehindthemask Avatar asked Jun 19 '14 01:06

thecatbehindthemask


People also ask

How do I save a text file as HTML in Python?

Open the source TXT file in Python. Call the 'save()' method, passing an output filename with HTML extension. Get the result of TXT conversion as HTML.

How do I save HTML data in Python?

Use open() and file. write() to write to an HTML file Use file. write(data) to write data to the file . Use file. close() to close the file after writing.

How do you extract the data from from website and save the into text file?

Click and drag to select the text on the Web page you want to extract and press “Ctrl-C” to copy the text. Open a text editor or document program and press “Ctrl-V” to paste the text from the Web page into the text file or document window. Save the text file or document to your computer.


2 Answers

I use Python 3.
pip install requests - after install requests library you can save a webpage in txt file.

import requests

url = "https://stackoverflow.com/questions/24297257/save-html-of-some-website-in-a-txt-file-with-python"

r = requests.get(url)
with open('file.txt', 'w') as file:
    file.write(r.text)
like image 118
Serhii Avatar answered Oct 19 '22 19:10

Serhii


Easiest way would be to use urlretrieve:

import urllib

urllib.urlretrieve("http://www.example.com/test.html", "test.txt")

For Python 3.x the code is as follows:

import urllib.request    
urllib.request.urlretrieve("http://www.example.com/test.html", "test.txt")
like image 33
elyase Avatar answered Oct 19 '22 21:10

elyase