Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python BeautifulSoup find element that contains text

<div class="info">
       <h3> Height:
            <span>1.1</span>
       </h3>
</div>

<div class="info">
       <h3> Number:
            <span>111111111</span>
       </h3>
</div>

This is a partial portion of the site. Ultimately, I want to extract the 111111111. I know I can do soup.find_all("div", { "class" : "info" }) to get a list of both divs; however, I would prefer to not have to perform a loop to check if it contains the text "Number".

Is there a more elegant way to extract "1111111" so that it does soup.find_all("div", { "class" : "info" }), but also makes it so that it MUST contain "Number" within?

I also tried numberSoup = soup.find('h3', text='Number') but it returns None

like image 288
lclankyo Avatar asked Oct 22 '25 02:10

lclankyo


1 Answers

You can write your own filter function and let it be the argument of function find_all.

from bs4 import BeautifulSoup

def number_span(tag):
    return tag.name=='span' and 'Number:' in tag.parent.contents[0]

soup = BeautifulSoup(html, 'html.parser')
tags = soup.find_all(number_span)

By the way, the reason you can't fetch tags with the text param is: text param helps us find tags whose .string value equal to its value. And if a tag contains more than one thing then it is not clear what .string should refer to. So .string is defined to be None.

You can reference to beautiful soup doc.

like image 109
dokelung Avatar answered Oct 23 '25 14:10

dokelung



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!