I am parsing a web page with BeautifulSoup, and it has some elements like the following:
<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font> 16043646</td>
The structure always seems to be a <td>
with the first part surrounded by <font><b>
, and the text after the </font>
tag can be empty. How can I get that text that is after the font tag?
In this example I would want to get "16043646"
. If the html was instead
<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font></td>
I would want to get ""
>>> from BeautifulSoup import BeautifulSoup
>>> text1 = '<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font> 16043646</td>'
>>> text2 = '<td><font size="2" color="#00009c"><b>Consultant Registration Number :</b></font></td>'
>>> BeautifulSoup(text1).td.font.nextSibling
u' 16043646'
>>> BeautifulSoup(text2).td.font.nextSibling
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With