Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python web scraping involving HTML a tag

I've been trying to scrap the names in a table from a website using bsoup script but the program is returning nothing or "[]". I would appreciate if any one can help me pointing what I'm doing wrong. Here is what I'm trying to run:

from bs4 import BeautifulSoup
import urllib2

url="http://www.trackinfo.com/entries-race.jsp?raceid=GBM$20140228E02"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
names=soup.findAll('a',{'href':'href="dog.jsp?runnername=[^.]*'})
for eachname in names:
print eachname.string

And here is one of the elements that I'm trying to get:

<a href="dog.jsp?runnername=PG+BAD+GRANDPA">

                        PG BAD GRANDPA

                        </a>
like image 481
user3319895 Avatar asked Dec 22 '25 15:12

user3319895


2 Answers

See the documentation for BeautifulSoup, which says that if you want to give a regular expression in a search, you need to pass in a compiled regular expression.

Taking your variables, this is what you want:

import re
names = soup.find_all("a",{"href":re.compile("dog")})
like image 155
gabe Avatar answered Dec 24 '25 03:12

gabe


A different approach, this one using Requests instead of urllib2. Matter of preference, really. Main point is that you should clean up your code, especially the indentation on the last line.

from bs4 import BeautifulSoup as bs
import requests
import re

url = "http://www.trackinfo.com/entries-race.jsp?raceid=GBM$20140228E02"
r = requests.get(url).content
soup = bs(r)
soup.prettify()

names = soup.find_all("a", href=re.compile("dog"))

for name in names:
    print name.get_text().strip()

Let us know if this helps.

like image 32
NullDev Avatar answered Dec 24 '25 03:12

NullDev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!