<span>
I Like
<span class='unwanted'> to punch </span>
your face
</span>
How to print "I Like your face" instead of "I Like to punch your face"
I tried this
lala = soup.find_all('span')
for p in lala:
if not p.find(class_='unwanted'):
print p.text
but it give "TypeError: find() takes no keyword arguments"
Tag. decompose() removes a tag from the tree of a given HTML document, then completely destroys it and its contents.
BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.
You can use extract()
to remove unwanted tag before you get text.
But it keeps all '\n'
and spaces
so you will need some work to remove them.
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
soup = BS(data, 'html.parser')
external_span = soup.find('span')
print("1 HTML:", external_span)
print("1 TEXT:", external_span.text.strip())
unwanted = external_span.find('span')
unwanted.extract()
print("2 HTML:", external_span)
print("2 TEXT:", external_span.text.strip())
Result
1 HTML: <span>
I Like
<span class="unwanted"> to punch </span>
your face
<span></span></span>
1 TEXT: I Like
to punch
your face
2 HTML: <span>
I Like
your face
<span></span></span>
2 TEXT: I Like
your face
You can skip every Tag
object inside external span and keep only NavigableString
objects (it is plain text in HTML).
data = '''<span>
I Like
<span class='unwanted'> to punch </span>
your face
<span>'''
from bs4 import BeautifulSoup as BS
import bs4
soup = BS(data, 'html.parser')
external_span = soup.find('span')
text = []
for x in external_span:
if isinstance(x, bs4.element.NavigableString):
text.append(x.strip())
print(" ".join(text))
Result
I Like your face
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With