What how I parse an html string with beautifulsoup that has inner tags within text

Question

I have a the following html content in a variable and need a way to read the text from the html by removing the inner tags html=<td class="row">India (ASIA) (<a href="/asia/india">india</a> – <a href="/asia/india">photos</a>)</td>

I just want to extract the string India (ASIA) out of this with BeautifulSoup. Is it possible or should I resort to use regular expressions for this.

har07 · Accepted Answer

This is one possible way using beautifulsoup, by extracting text content before child element <a> :

from bs4 import BeautifulSoup

html = """<td class="row">India (ASIA) (<a href="/asia/india">india</a>&nbsp;–&nbsp;<a href="/asia/india">photos</a>)</td>"""
soup = BeautifulSoup(html)
result = soup.find("a").previousSibling
print(result.decode('utf-8'))

output :

India (ASIA) (

_{tweaking the code further to remove trailing ( from result should be straightforward}

What how I parse an html string with beautifulsoup that has inner tags within text

Tags:

python

beautifulsoup

Kshitiz Gupta

1 Answers

har07

Recent Activity

Donate For Us

What how I parse an html string with beautifulsoup that has inner tags within text

Tags:

python

beautifulsoup

Kshitiz Gupta

1 Answers

har07

Related questions

Recent Activity

Donate For Us