Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understand the Find() function in Beautiful Soup

I know what I'm trying to do is simple but it's causing me grief. I'd like pull data from HTML using BeautifulSoup. To do that I need to properly use the .find() function. Here's the HTML I'm working with:

<div class="audit">

    <div class="profile-info">
        <img class="profile-pic" src="https://pbs.twimg.com/profile_images/471758097036226560/tLLeiOiL_normal.jpeg" />
        <h4>Ed Boon</h4>
        <span class="screen-name"><a href="http://www.twitter.com/noobde" target="_blank">@noobde</a></span>
    </div>

        <div class="followers">
            <div class="pie"></div>
            <div class="pie-data">
                <span class="real number" data-value=73599>73,599</span><span class="real"> Real</span><br />
                <span class="fake number" data-value=32452>32,452</span><span class="fake"> Fake</span><br />
                <h6>Followers</h6>
            </div>
        </div>
        <div class="score">
            <img src="//twitteraudit-prod.s3.amazonaws.com/dist/f977287de6281fe3e1ef36d48d996fb83dd6a876/img/audit-result-good.png" />
            <div class="percentage good">
                69%
            </div>
            <h6>Audit score</h6>

The values I want are 73599 from data-value=73599, 32352 from data-value=32452, and the 69% from percentage good.

Using past code and online examples, this is what I have so far:

RealValue = soup.find("div", {"class":"real number"})['data-value']
FakeValue = soup.find("audit", {"class":"fake number"})['data-value']

Both so far to no effect. I'm not sure how to craft the find in order to pull the 69% number.

like image 697
OneManRiot Avatar asked Dec 16 '15 00:12

OneManRiot


People also ask

What is Find () method in Beautiful Soup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

What does Beautiful Soup function do?

Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.

Which of these is a filter used to search a web tree using Beautiful Soup?

But the most used method for searching the parse tree is the find() and find_all() method. With the help of this, we can parse the HTML tree using Beautifulsoup.


1 Answers

soup.find("div", {"class":"real number"})['data-value']

Here you are searching for a div element, but the span has the "real number" class in your example HTML data, try instead:

soup.find("span", {"class": "real number", "data-value": True})['data-value']

Here we are also checking for presence of data-value attribute.


To find elements having "real number" or "fake number" classes, you can make a CSS selector:

for elm in soup.select(".real.number,.fake.number"):
    print(elm.get("data-value"))

To get the 69% value:

soup.find("div", {"class": "percentage good"}).get_text(strip=True)

Or, a CSS selector:

soup.select_one(".percentage.good").get_text(strip=True)
soup.select_one(".score .percentage").get_text(strip=True)

Or, locating the h6 element having Audit score text and then getting the preceding sibling:

soup.find("h6", text="Audit score").previous_sibling.get_text(strip=True)
like image 78
alecxe Avatar answered Sep 29 '22 00:09

alecxe