Understand the Find() function in Beautiful Soup

Tags:

I know what I'm trying to do is simple but it's causing me grief. I'd like pull data from HTML using BeautifulSoup. To do that I need to properly use the .find() function. Here's the HTML I'm working with:

<div class="audit">

    <div class="profile-info">
        <img class="profile-pic" src="https://pbs.twimg.com/profile_images/471758097036226560/tLLeiOiL_normal.jpeg" />
        <h4>Ed Boon</h4>
        <span class="screen-name"><a href="http://www.twitter.com/noobde" target="_blank">@noobde</a></span>
    </div>

        <div class="followers">
            <div class="pie"></div>
            <div class="pie-data">
                <span class="real number" data-value=73599>73,599</span><span class="real"> Real</span><br />
                <span class="fake number" data-value=32452>32,452</span><span class="fake"> Fake</span><br />
                <h6>Followers</h6>
            </div>
        </div>
        <div class="score">
            <img src="//twitteraudit-prod.s3.amazonaws.com/dist/f977287de6281fe3e1ef36d48d996fb83dd6a876/img/audit-result-good.png" />
            <div class="percentage good">
                69%
            </div>
            <h6>Audit score</h6>

The values I want are 73599 from data-value=73599, 32352 from data-value=32452, and the 69% from percentage good.

Using past code and online examples, this is what I have so far:

RealValue = soup.find("div", {"class":"real number"})['data-value']
FakeValue = soup.find("audit", {"class":"fake number"})['data-value']

Both so far to no effect. I'm not sure how to craft the find in order to pull the 69% number.

697

asked Dec 16 '15 00:12

OneManRiot

1 Answers

soup.find("div", {"class":"real number"})['data-value']

Here you are searching for a div element, but the span has the "real number" class in your example HTML data, try instead:

soup.find("span", {"class": "real number", "data-value": True})['data-value']

Here we are also checking for presence of data-value attribute.

To find elements having "real number" or "fake number" classes, you can make a CSS selector:

for elm in soup.select(".real.number,.fake.number"):
    print(elm.get("data-value"))

To get the 69% value:

soup.find("div", {"class": "percentage good"}).get_text(strip=True)

Or, a CSS selector:

soup.select_one(".percentage.good").get_text(strip=True)
soup.select_one(".score .percentage").get_text(strip=True)

Or, locating the h6 element having Audit score text and then getting the preceding sibling:

soup.find("h6", text="Audit score").previous_sibling.get_text(strip=True)

answered Sep 29 '22 00:09

alecxe

Related questions
                            
                                How to see (log) file transfer progress using paramiko?
                            
                                How to kill a WxPython application when user clicks a frame's close
                            
                                Multi Celery projects with same RabbitMQ broker backend process
                            
                                Clean way to disable `__setattr__` until after initialization
                            
                                Python PIL - changing colour profile to untagged RGB on crop, scale and save
                            
                                Comparing Two Dictionaries Key Values and Returning the Value If Match
                            
                                python import module from parent package
                            
                                Allowing Ctrl-C to interrupt a python C-extension
                            
                                Python multiprocessing memory usage
                            
                                Append several variables to a list in Python
                            
                                Understanding change-making algorithm
                            
                                Pulling data to the template from an external database with django
                            
                                XML (.xsd) feed validation against a schema
                            
                                Why is "import" implemented this way?
                            
                                what is the proper way to do logging in csv file?
                            
                                Resolving AmbiguousTimeError from Django's make_aware
                            
                                How to get WhoIs info by IP in Python 3?
                            
                                kafka-server-stop.sh not working when Kafka started from Python script
                            
                                How do I can format exception stacktraces in Python logging?
                            
                                Casting a new derived column in a DataFrame from boolean to integer

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understand the Find() function in Beautiful Soup

Tags:

python

html

beautifulsoup

OneManRiot

People also ask

1 Answers

alecxe

Recent Activity

Donate For Us