Example:
Sometimes the HTML is:
<div id="1">
<div id="2">
this is the text i do NOT want
</div>
this is the text i want here
</div>
Other times it's just:
<div id="1">
this is the text i want here
</div>
I want to get only the text in the one tag, and ignore all other child tags. If I run the .text
property, I get both.
Another possible approach (I would make it in a function) :
def getText(parent):
return ''.join(parent.find_all(text=True, recursive=False)).strip()
recursive=False
indicates that you want only direct children, not nested ones. And text=True
indicates that you want only text nodes.
Usage example :
from bs4 import BeautifulSoup
html = """<div id="1">
<div id="2">
this is the text i do NOT want
</div>
this is the text i want here
</div>
"""
soup = BeautifulSoup(html)
print(getText(soup.div))
#this is the text i want here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With