Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract value from span tag

I am writing a simple web scraper to extract the game times for the ncaa basketball games. The code doesn't need to be pretty, just work. I have extracted the value from other span tags on the same page but for some reason I cannot get this one working.

from bs4 import BeautifulSoup as soup
import requests

url = 'http://www.espn.com/mens-college-basketball/game/_/id/401123420'
response = requests.get(url)
soupy = soup(response.content, 'html.parser')

containers = soupy.findAll("div",{"class" : "team-container"})
for container in containers:
    spans = container.findAll("span")
    divs = container.find("div",{"class": "record"})
    ranks = spans[0].text
    team_name = spans[1].text
    team_mascot = spans[2].text
    team_abbr = spans[3].text
    team_record = divs.text
    time_container = soupy.find("span", {"class":"time game-time"})
    game_times = time_container.text
    refs_container = soupy.find("div", {"class" : "game-info-note__container"})
    refs = refs_container.text
    print(ranks)
    print(team_name)
    print(team_mascot)
    print(team_abbr)
    print(team_record)
    print(game_times)
    print(refs)

The specific code I am concerned about is this,

 time_container = soupy.find("span", {"class":"time game-time"})
    game_times = time_container.text

I just provided the rest of the code to show that the .text on other span tags work. The time is the only data I truly want. I just get an empty string with how my code is currently.

This is the output of the code I get when I call time_container

<span class="time game-time" data-dateformat="time1" data-showtimezone="true"></span>

or just '' when I do game_times.

Here is the line of the HTML from the website:

<span class="time game-time" data-dateformat="time1" data-showtimezone="true">6:10 PM CT</span>

I don't understand why the 6:10 pm is gone when I run the script.

like image 968
zezima Avatar asked Apr 09 '19 22:04

zezima


People also ask

How do I get text from a span?

Use the textContent property to get the text of a span element, e.g. const text = span. textContent . The textContent property will return the text content of the span and its descendants. If the element is empty, an empty string is returned.

Does span tag have value?

span elements do not have a value property. Instead, use html() for the HTML or text() for the text nodes.

How do you use span tags?

The <span> tag is an inline container used to mark up a part of a text, or a part of a document. The <span> tag is easily styled by CSS or manipulated with JavaScript using the class or id attribute. The <span> tag is much like the <div> element, but <div> is a block-level element and <span> is an inline element.


2 Answers

The site is dynamic, thus, you need to use selenium:

from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('http://www.espn.com/mens-college-basketball/game/_/id/401123420')
game_time = soup(d.page_source, 'html.parser').find('span', {'class':'time game-time'}).text

Output:

'7:10 PM ET'

See full selenium documentation here.

like image 70
Ajax1234 Avatar answered Oct 22 '22 22:10

Ajax1234


An alternative would be to use some of ESPN's endpoints. These endpoints will return JSON responses. https://site.api.espn.com/apis/site/v2/sports/basketball/mens-college-basketball/scoreboard

You can see other endpoints at this GitHub link https://gist.github.com/akeaswaran/b48b02f1c94f873c6655e7129910fc3b

This will make your application pretty light weight compared to running Selenium.

I recommend opening up inspect and going to the network tab. You can see all sorts of cool stuff happening. You can see all the requests that are happening in the site.

like image 2
Jose Ortiz Avatar answered Oct 23 '22 00:10

Jose Ortiz