Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use BeautifulSoup to get a value after a specific tag

I'm having a very hard time getting BeautifulSoup to scrape some data for me. What's the best way to access the date (the actual numbers, 2008) from this code sample? It's my first time using Beautifulsoup, I've figured out how to scrape urls off of the page, but I can't quite narrow it down to only select the word Date, and then to only return whatever numeric date follows (in the dd brackets). Is what I'm asking even possible?

<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
        2008
    </dd>
</div>
like image 899
knames Avatar asked Sep 11 '14 03:09

knames


People also ask

How do you find a specific text tag in BeautifulSoup?

To find elements that contain a specific text in Beautiful Soup, we can use find_all(~) method together with a lambda function.

What is the difference between Find_all () and find () in BeautifulSoup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.


1 Answers

Find the dt tag by text and find the next dd sibling:

soup.find('div', class_='detail_date').find('dt', text='Date').find_next_sibling('dd').text

The complete code:

from bs4 import BeautifulSoup

data = """
<div class='dl_item_container clearfix detail_date'>
    <dt>Date</dt>
    <dd>
    2008
    </dd>
</div>
"""

soup = BeautifulSoup(data, 'html.parser')
date_field = soup.find('div', class_='detail_date').find('dt', text='Date')
print(date_field.find_next_sibling('dd').text.strip())

Prints 2008.

like image 178
alecxe Avatar answered Nov 03 '22 21:11

alecxe