Find div text through div label with beautifulsoup

Tags:

Assume the following html snippet, from which I would like to extract the values corresponding to the labels 'price' and 'ships from':

<div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>

Which is part of a larger html file. Assume that in some files the 'Ships from' label is present, sometimes not. I would like to use BeautifulSoup, of a similar approach, to deal with this, because of the variability of the html content. Multiple div and span are present, which makes it hard to select without id or class name

My thoughts, something like this:

t = open('snippet.html', 'rb').read().decode('iso-8859-1')
s = BeautifulSoup(t, 'lxml')
s.find('div.divName[label*=Price]')
s.find('div.divName[label*=Ships from]')

However, this returns an empty list.

657

asked May 22 '19 08:05

Jeroen

3 Answers

Use select to find label and then use find_next_sibling().text

Ex:

from bs4 import BeautifulSoup

html = """<div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>"""

soup = BeautifulSoup(html, "html.parser")
for lab in soup.select("label"):
    print(lab.find_next_sibling().text)

Output:

22.99
EU

133

answered Oct 22 '22 15:10

Rakesh

You can use :contains (with bs 4.7.1 and next_sibling

import requests
from bs4 import BeautifulSoup as bs

html = '''
<div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>
'''

soup = bs(html, 'lxml')
items = soup.select('label:contains(Price), label:contains("Ships from")')

for item in items:
    print(item.text, item.next_sibling.next_sibling.text)

answered Oct 22 '22 13:10

QHarr

Try this :

from bs4 import BeautifulSoup
from bs4.element import Tag

html = """ <div class="divName">
    <div>
        <label>Price</label>
        <div>22.99</div>
    </div>
    <div>
        <label>Ships from</label>
        <span>EU</span>
    </div>
</div>"""

s = BeautifulSoup(html, 'lxml')
row = s.find(class_='divName')

Solutio-1 :

for tag in row.findChildren():
    if len(tag) > 1:
        continue
    if tag.name in 'span' and isinstance(tag, Tag):
        print(tag.text)
    elif tag.name in 'div' and isinstance(tag, Tag):
        print(tag.text)

Solution-2:

for lab in row.select("label"):
    print(lab.find_next_sibling().text)

O/P:

22.99
EU

answered Oct 22 '22 15:10

bharatk

Related questions
                            
                                Layout and Dropdown menu in Dash - Python
                            
                                Heroku app successfully deploying, but receiving application error when loading site
                            
                                Use the highest value for duplicate IDs (Pandas DataFrame)
                            
                                How to handle Google Authenticator with Selenium
                            
                                Pandas datetime week not as expected
                            
                                Displaying matplotlib plot using Flask
                            
                                Iterable unpacking and slice assignment
                            
                                requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8000): Max retries exceeded with url: /api/1/
                            
                                Enable APIs using serviceusage API with a service account
                            
                                How to install libcurl with nss backend in aws ec2? (Python 3.6 64bit Amazon Linux)
                            
                                Downsizing from Anaconda to Miniconda
                            
                                Tensorflow2.0 training: model.compile vs GradientTape
                            
                                Suppress OpenMP debug messages when running Tensorflow on CPU
                            
                                How to vectorize pandas dataframe forward column value search
                            
                                Pandas: Separate column containing semicolon into multiple columns based on the values
                            
                                How to permanently mock return value of a function in python unittest
                            
                                How to find top_left, top_right, bottom_left, right coordinates in 2d mask where cell has specified value?
                            
                                How to extract json from nested column to dataframe
                            
                                Getting error while trying to read csv using pandas Python due to extra column values
                            
                                Find indices of 2D numpy arrays that meet a condition

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find div text through div label with beautifulsoup

Tags:

python

html

beautifulsoup

python-3.6

web-scraping