Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove unnecessary repeated tags with BeautifulSoup

I am using Python and BeautifulSoup to extract some text from html. I have some html that has text of the form

<h3><b> Abc </b><b> DEF </b> </h3>

I would like to remove the repeated b tag. Is there a quick way to do this?

like image 992
saurabh Avatar asked Nov 04 '22 01:11

saurabh


1 Answers

For bs4 this seems to work just fine

In [4]: soup.h3
Out[4]: <h3><b> Abc </b><b> DEF </b> </h3>

In [5]: soup.h3.text
Out[5]: u' Abc  DEF  '

check out the docs and the package here: https://beautiful-soup-4.readthedocs.org/en/latest/ https://pypi.python.org/pypi/beautifulsoup4

like image 103
dusual Avatar answered Nov 15 '22 05:11

dusual