I've been learning the basics of Python for a short while, and thought I'd go ahead and try to put something together, but appear to have hit a stumbling block (despite looking just about everywhere to see where I may be going wrong).
I'm trying to grab a table i.e. from here: https://www.oddschecker.com/horse-racing/2020-09-10-chelmsford-city/20:30/winner
Now I realize that the table isn't set out how typically a normal HTML would be, and therefore trying to grab this with Pandas wouldn't yield results. Therefore delved into BeautifulSoup to try and get a result.
It seems all the data I would need is within the class 'diff-row evTabRow bc' and therefore wrote the following:
url = requests.get('https://www.oddschecker.com/horse-racing/2020-09-10-haydock/14:00/winner')
soup = BeautifulSoup(url.content, 'lxml')
table = soup.find_all("tr", class_="diff-row evTabRow bc")
This seems to put each horse and all corresponding data I'd need for it, into a list. Within this list, I'd only need certain bits, i.e. "data-name" for the horse name, and "data-odig" for the current odds.
I thought there may be some way I could then extract the data from the list to build a list of lists, and then construct a data frame in Pandas, but I may be going about this all wrong.
Pandas makes it easy to scrape a table ( <table> tag) on a web page. After obtaining it as a DataFrame, it is of course possible to do various processing and save it as an Excel file or csv file.
You can access any of the <tr>
attributes with the BeautifulSoup object .attrs
property.
Once you have table
, loop over each entry, pull out the attributes you want as a list of dicts. Then initialize a Pandas data frame with the resulting list.
horse_attrs = list()
for entry in table:
attrs = dict(name=entry.attrs['data-bname'], dig=entry.attrs['data-best-dig'])
horse_attrs.append(attrs)
df = pd.DataFrame(horse_attrs)
df
name dig
0 Las Farras 9999
1 Heat Miami 9999
2 Martin Beck 9999
3 Litran 9999
4 Ritmo Capanga 9999
5 Perfect Score 9999
6 Simplemente Tuyo 9999
7 Anpacai 9999
8 Colt Fast 9999
9 Cacharpari 9999
10 Don Leparc 9999
11 Curioso Seattle 9999
12 Golpe Final 9999
13 El Acosador 9999
Notes:
data-name
and data-odig
) you mentioned, so I used ones with similar names. I don't know enough about horse racing to know if these are useful, but the method in this answer should allow you to choose any of the attributes that are available.The data you are looking for is both in the row tag <tr> and in the cell tags <td>.
The issue is that not all of the <td>'s are useful, so you have to skip those.
import pandas as pd
from bs4 import BeautifulSoup
import requests
url = requests.get('https://www.oddschecker.com/horse-racing/thirsk/13:00/winner')
soup = BeautifulSoup(url.content, 'lxml')
rows = soup.find_all("tr", class_="diff-row evTabRow bc")
my_data = []
for row in rows:
horse = row.attrs['data-bname']
for td in row:
if td.attrs['class'][0] != 'np':
continue #Skip
bookie = td['data-bk']
odds = td['data-odig']
my_data.append(dict(
horse = horse,
bookie = bookie,
odds = odds
))
df = pd.DataFrame(my_data)
print(df)
This will give you what you are looking for:
horse bookie odds
0 Just Frank B3 3.75
1 Just Frank SK 4.33
2 Just Frank WH 4.33
3 Just Frank EE 4.33
4 Just Frank FB 4.2
.. ... ... ...
268 Tommy R RZ 29
269 Tommy R SX 26
270 Tommy R BF 10.8
271 Tommy R MK 41
272 Tommy R MA 98
[273 rows x 3 columns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With