Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use python beautiful soup to get only the level 1 navigableText?

I am using beautiful soup to get the text from this example html code:

....
<div style="s1">
    <div style="s2">Here is text 1</div>
    <div style="s3">Here is text 2</div>
Here is text 3 and this is what I want.
</div>
....

Text 1 and text 2 is at the same level 2 and the text 3 is at the upper level 1. I only want to get the text 3 and used this:

for anchor in tbody.findAll('div', style="s1"):
    review=anchor.text
    print review

But these code get me all the text 1,2,3. How do I only get the first level text 3?

like image 835
user2437712 Avatar asked Nov 28 '25 06:11

user2437712


1 Answers

Something like:

for anchor in tbody.findAll('div', style="s1"):
    text = ''.join([x for x in anchor.contents if isinstance(x, bs4.element.NavigableString)])

works. Just know that you'll also get the line breaks in there, so .strip()ing might be necessary.

For example:

for anchor in tbody.findAll('div', style="s1"):
    text = ''.join([x for x in anchor.contents if isinstance(x, bs4.element.NavigableString)])
    print([text])
    print([text.strip()])

Prints

[u'\n\n\nHere is text 3 and this is what I want.\n']
[u'Here is text 3 and this is what I want.']

(I put them in lists so you could see the newlines.)

like image 80
jedwards Avatar answered Nov 30 '25 19:11

jedwards