Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write the output to html file with Python BeautifulSoup

I modified an html file by removing some of the tags using beautifulsoup. Now I want to write the results back in a html file. My code:

from bs4 import BeautifulSoup from bs4 import Comment  soup = BeautifulSoup(open('1.html'),"html.parser")  [x.extract() for x in soup.find_all('script')] [x.extract() for x in soup.find_all('style')] [x.extract() for x in soup.find_all('meta')] [x.extract() for x in soup.find_all('noscript')] [x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))] html =soup.contents for i in html:     print i  html = soup.prettify("utf-8") with open("output1.html", "wb") as file:     file.write(html) 

Since I used soup.prettify, it generates html like this:

<p>     <strong>      BATAM.TRIBUNNEWS.COM, BINTAN     </strong>     - Tradisi pedang pora mewarnai serah terima jabatan pejabat di     <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">      Polres     </a>     <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">      Bintan     </a>     , Senin (3/10/2016).    </p> 

I want to get the result like print i does:

<p><strong>BATAM.TRIBUNNEWS.COM, BINTAN</strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">Polres</a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">Bintan</a>, Senin (3/10/2016).</p> <p>Empat perwira baru Senin itu diminta cepat bekerja. Tumpukan pekerjaan rumah sudah menanti di meja masing masing.</p> 

How can I get a result the same as print i (ie. so the tag and its content appear on the same line)? Thanks.

like image 601
Kim Hyesung Avatar asked Nov 10 '16 14:11

Kim Hyesung


People also ask

How do I print HTML output in Python?

try using break and \n together. br is needed in web page display and \n is needed in source display. \n basically means new line in the source display. btw, are you using the aforementioned code instead of message = j + ', ' ?

Can BeautifulSoup create HTML?

A BeautifulSoup object is created; the HTML data is passed to the constructor. The second option specifies the parser. Here we print the HTML code of two tags: h2 and head . There are multiple li elements; the line prints the first one.

Can BeautifulSoup parse HTML?

The HTML content of the webpages can be parsed and scraped with Beautiful Soup.


1 Answers

Just convert the soup instance to string and write:

with open("output1.html", "w") as file:     file.write(str(soup)) 
like image 168
alecxe Avatar answered Sep 22 '22 07:09

alecxe