Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using nextSibling from BeautifulSoup outputs nothing

I'm trying to use BeautifulSoup on the following:

<h4>Hello<br /></h4>
<p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p>

For this example, let's say I have the <h4> tag saved in the variable tag. When I type print tag.text the output is Hello, as expected.

However, when I use print tag.nextSibling the output is nothing. When I type print tag.nextSibling.nextSibling, the output is <p><img src="http://url.goes.here" alt="hiya" class="img" />May 28, 1996</p>. What is going on? Why do I have to double up on the use of .nextSibling to get to the <p> tag in my example? This is consistently an error.

like image 209
Tony Stark Avatar asked Apr 17 '11 00:04

Tony Stark


People also ask

How do you get a sibling in Beautifulsoup?

find_next_sibling() function is used to find the succeeding sibling of a tag/element. It only returns the first match next to the tag/element.

What is NavigableString?

A NavigableString object holds the text within an HTML or an XML tag. This is a Python Unicode string with methods for searching and navigation. Sometimes we may need to navigate to other tags or text within an HTML/XML document based on the current text.


1 Answers

Apparently, .nextSibling will grab white text. So in the actual page I'm working with, there is white text between the <h4> and <p> tags, which is why I have to double.

Evidence

Writing:

print tag.__class__
print tag.nextSibling.__class__
print tag.nextSibling.nextSibling.__class__

Yields:

<class 'BeautifulSoup.Tag'>
<class 'BeautifulSoup.NavigableString'>
<class 'BeautifulSoup.Tag'>
like image 140
Tony Stark Avatar answered Nov 10 '22 01:11

Tony Stark