I wrote a tiny html-parser in Python using lxml. It's very useful, but I have a problem.
I have the following code:
tags = doc.xpath('//table//tr/td[@align="right"]/b')
for tag in tags:
print(x.text.strip())
It works fine. But if there is a <br>
tag inside a <b>
element, like this:
<b> first-half <br>
second-half </b>
this code will only print first-half
into the <b>
tag.
How can I get all of text in <b>
even if there is a <br>
tag?
Thanks.
Use text_content()
to extract all of the non-markup text within a tag. Replace x.text
with x.text_content()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With