Extracting text node inside a tag that has a child element in beautifulsoup4

Question

The HTML that I am parsing and scraping has the following code:

<li> <span> 929</span> Serve Returned </li>

How can I extract just the text node of <li>, "serve returned" in this case with Beautifulsoup?

.string doesn't work since <li> has a child element, and .text returns the text inside <span>.

Hooked · Accepted Answer

import bs4
html = r"<li> <span> 929</span> Serve Returned </li>"
soup = bs4.BeautifulSoup(html)
print soup.li.findAll(text=True, recursive=False)

This gives:

[u' ', u' Serve Returned ']

The first element is the "text" you have before the span. This method could help you find text before and after (and in-between) any child elements.

Totem · Answer

I used the str.replace method for this:

>>> li = soup.find('li') # or however you need to drill down to the <li> tag 
>>> mytext = li.text.replace(li.find('span').text, "") 
>>> print mytext
Serve Returned

Extracting text node inside a tag that has a child element in beautifulsoup4

Tags:

python

beautifulsoup

web-scraping

user3562812

2 Answers

Hooked

Totem

Recent Activity

Donate For Us

Extracting text node inside a tag that has a child element in beautifulsoup4

Tags:

python

beautifulsoup

web-scraping

user3562812

2 Answers

Hooked

Totem

Related questions

Recent Activity

Donate For Us