Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting items out of an element.ResultSet

I found a cool python that script that scrapes player information off of NFL rosters. However I would like to add NFL Combine results to the data. I have included an example below for one player.

import urllib.request
from bs4 import BeautifulSoup

URL2 = 'www.nfl.com/player/deandrewwhite/2552657/combine'
soupCombine = BeautifulSoup(urllib.request.urlopen(URL2))
Combinestats = soupCombine.find_all("div", attrs = {"class": "tp-title"})
Combinestats[0].contents

Produces:

['3 Cone Drill', < span class="tp-results">6.97 secs< /span>]

How do I get the following out of Combinestats[0].contents?

DrillName = '3 Cone Drill'

DrillResult = 6.97

For reference here are the items in Combinestats.

for ii in range(len(Combinestats)):
     print(Combinestats[ii].contents)

['3 Cone Drill', <span class="tp-results">6.97 secs</span>]
['40 Yard Dash', <span class="tp-results">4.44 Secs</span>]
['Broad Jump', <span class="tp-results">118.0 inches</span>]
['20 Yard Shuttle', <span class="tp-results">4.18 secs</span>]
['Vertical Jump', <span class="tp-results">34.5 inches</span>]
like image 551
DeeeeRoy Avatar asked Mar 23 '18 16:03

DeeeeRoy


2 Answers

Just use a list comprehension.

resultSet = soup.find_all("div", attrs = {"class": "tp-title"})
stats = [
    (i.contents[0], i.contents[1].text) for i in resultSet

]

Or, a for loop.

stats = []
for i in resultSet:
    stats.append(i.contents[0], i.contents[1].text)

print(stats)
[
    ('40 Yard Dash', '4.44 Secs'),
    ('3 Cone Drill', '6.97 secs'),
    ('Broad Jump', '118.0 inches'),
    ('20 Yard Shuttle', '4.18 secs'),
    ('Vertical Jump', '34.5 inches')
]
like image 180
cs95 Avatar answered Sep 30 '22 16:09

cs95


This is another approach which does the same thing. Slightly awkward to look at, though.

import requests
from bs4 import BeautifulSoup

URL = "http://www.nfl.com/player/deandrewwhite/2552657/combine"
res = requests.get(URL)
soup = BeautifulSoup(res.text,"lxml")
items = {item.select_one(".tp-results").previous_sibling:item.select_one(".tp-results").text for item in soup.select(".tp-title")}
print(items)

Output:

{'3 Cone Drill': '6.97 secs', '20 Yard Shuttle': '4.18 secs', '40 Yard Dash': '4.44 Secs', 'Vertical Jump': '34.5 inches', 'Broad Jump': '118.0 inches'}
like image 32
SIM Avatar answered Sep 30 '22 17:09

SIM