I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following:
import urllib2
from bs4 import BeautifulSoup
def getData():
html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8')
soup = BeautifulSoup(html)
print(soup.title)
print(soup.find_all('<div class="crBlock ">'))
getData()
I want to be able to select everything between <div class="crBlock ">
and its correct end </div>
. (Obviously there are other div tags but I want to select the block all the way down to the one that represents the end of this section of html.)
The correct use would be:
soup.find_all('div', class_="crBlock ")
By default, beautiful soup will return the entire tag, including contents. You can then do whatever you want to it if you store it in a variable. If you are only looking for one div, you can also use find()
instead. For instance:
div = soup.find('div', class_="crBlock ")
print(div.find_all(text='foobar'))
Check out the documentation page for more info on all the filters you can use.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With