Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using BeautifulSoup to select div blocks within HTML

I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following:

import urllib2
from bs4 import BeautifulSoup

def getData():

    html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8')

    soup = BeautifulSoup(html)

    print(soup.title)
    print(soup.find_all('<div class="crBlock ">'))

getData()

I want to be able to select everything between <div class="crBlock "> and its correct end </div>. (Obviously there are other div tags but I want to select the block all the way down to the one that represents the end of this section of html.)

like image 731
SMNALLY Avatar asked Sep 25 '13 17:09

SMNALLY


1 Answers

The correct use would be:

soup.find_all('div', class_="crBlock ")

By default, beautiful soup will return the entire tag, including contents. You can then do whatever you want to it if you store it in a variable. If you are only looking for one div, you can also use find() instead. For instance:

div = soup.find('div', class_="crBlock ")
print(div.find_all(text='foobar'))

Check out the documentation page for more info on all the filters you can use.

like image 107
Wiwiweb Avatar answered Oct 05 '22 01:10

Wiwiweb