I'm trying to extract the text inside from the following html structure:
<div class="account-places">
<div>
<ul class="location-history">
<li></li>
<li>Text to extract</li>
</ul>
</div>
</div>
I have the following BeautifulSoup code to do it:
from bs4 import BeautifulSoup as bs
soup = bs(html, "lxml")
div = soup.find("div", {"class": "account-places"})
text = div.div.ul.li.next_sibling.get_text()
But Beautiful Soup is throwing the error: 'NavigableString' object has no attribute 'get_text'. What am I doing wrong?
Looks like you need find_next_sibling("li")
.
Ex:
from bs4 import BeautifulSoup as bs
soup = bs(html, "lxml")
div = soup.find("div", {"class": "account-places"})
text = div.div.ul.li.find_next_sibling("li").get_text()
print(text)
Output:
Text to extract
Since the next_sibling
call returns a NavigableString, you have to follow that syntax:
text = unicode(div.div.ul.li.next_sibling)
To quote the documentation:
A NavigableString is just like a Python Unicode string, except that it also supports some of the features described in Navigating the tree and Searching the tree. You can convert a NavigableString to a Unicode string with unicode()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With