I modified an html file by removing some of the tags using beautifulsoup
. Now I want to write the results back in a html file. My code:
from bs4 import BeautifulSoup from bs4 import Comment soup = BeautifulSoup(open('1.html'),"html.parser") [x.extract() for x in soup.find_all('script')] [x.extract() for x in soup.find_all('style')] [x.extract() for x in soup.find_all('meta')] [x.extract() for x in soup.find_all('noscript')] [x.extract() for x in soup.find_all(text=lambda text:isinstance(text, Comment))] html =soup.contents for i in html: print i html = soup.prettify("utf-8") with open("output1.html", "wb") as file: file.write(html)
Since I used soup.prettify, it generates html like this:
<p> <strong> BATAM.TRIBUNNEWS.COM, BINTAN </strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> Polres </a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"> Bintan </a> , Senin (3/10/2016). </p>
I want to get the result like print i
does:
<p><strong>BATAM.TRIBUNNEWS.COM, BINTAN</strong> - Tradisi pedang pora mewarnai serah terima jabatan pejabat di <a href="http://batam.tribunnews.com/tag/polres/" title="Polres">Polres</a> <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan">Bintan</a>, Senin (3/10/2016).</p> <p>Empat perwira baru Senin itu diminta cepat bekerja. Tumpukan pekerjaan rumah sudah menanti di meja masing masing.</p>
How can I get a result the same as print i
(ie. so the tag and its content appear on the same line)? Thanks.
try using break and \n together. br is needed in web page display and \n is needed in source display. \n basically means new line in the source display. btw, are you using the aforementioned code instead of message = j + ', ' ?
A BeautifulSoup object is created; the HTML data is passed to the constructor. The second option specifies the parser. Here we print the HTML code of two tags: h2 and head . There are multiple li elements; the line prints the first one.
The HTML content of the webpages can be parsed and scraped with Beautiful Soup.
Just convert the soup
instance to string and write:
with open("output1.html", "w") as file: file.write(str(soup))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With