how do you use python 2.6 to remove everything including the <div class="comment"> ....remove all ....</div>
i tried various way using re.sub without any success
Thank you
This can be done easily and reliably using an HTML parser like BeautifulSoup:
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup('<body><div>1</div><div class="comment"><strong>2</strong></div></body>')
>>> for div in soup.findAll('div', 'comment'):
... div.extract()
...
<div class="comment"><strong>2</strong></div>
>>> soup
<body><div>1</div></body>
See this question for examples on why parsing HTML using regular expressions is a bad idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With