I'm trying to use BeautifulSoup on the following:
<h4>Hello<br /></h4>
<p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p>
For this example, let's say I have the <h4>
tag saved in the variable tag
. When I type print tag.text
the output is Hello
, as expected.
However, when I use print tag.nextSibling
the output is nothing. When I type print tag.nextSibling.nextSibling
, the output is <p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p>
. What is going on? Why do I have to double up on the use of .nextSibling
to get to the <p>
tag in my example? This is consistently an error.
find_next_sibling() function is used to find the succeeding sibling of a tag/element. It only returns the first match next to the tag/element.
A NavigableString object holds the text within an HTML or an XML tag. This is a Python Unicode string with methods for searching and navigation. Sometimes we may need to navigate to other tags or text within an HTML/XML document based on the current text.
Apparently, .nextSibling will grab white text. So in the actual page I'm working with, there is white text between the <h4>
and <p>
tags, which is why I have to double.
Evidence
Writing:
print tag.__class__
print tag.nextSibling.__class__
print tag.nextSibling.nextSibling.__class__
Yields:
<class 'BeautifulSoup.Tag'>
<class 'BeautifulSoup.NavigableString'>
<class 'BeautifulSoup.Tag'>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With