On https://www.hltv.org/matches page, matches divided by dates but the classes are same. I mean,
This is today's match class
<div class="match-day"><div class="standard-headline">2018-05-01</div>
This is tommorow's match class.
<div class="match-day"><div class="standard-headline">2018-05-02</div>
What i'm trying to do is, I wanna get the links under the "standard-headline" class but only today's matches. Like, getting the only first one.
Here is my code.
import urllib.request
from bs4 import BeautifulSoup
headers = {} # Headers gives information about you like your operation system, your browser etc.
headers['User-Agent'] = 'Mozilla/5.0' # I defined a user agent because HLTV perceive my connection as bot.
hltv = urllib.request.Request('https://www.hltv.org/matches', headers=headers) # Basically connecting to website
session = urllib.request.urlopen(hltv)
sauce = session.read() # Getting the source of website
soup = BeautifulSoup(sauce, 'lxml')
matchlinks = []
# Getting the match pages' links.
for links in soup.find_all('div', class_='upcoming-matches'): # Looking for "upcoming-matches" class in source.
for links in soup.find_all('a'): # Finding "a" tag under "upcoming-matches" class.
clearlink = links.get('href') # Getting the value of variable.
if clearlink.startswith('/matches/'): # Checking for if our link starts with "/matches/"
matchlinks.append('https://hltv.org' + clearlink) # Adding into list.
Actually, the website shows today's matches first (at the top), and then the next days'. So, if you want to get today's matches, you can simply use find(), which return the first match found.
Using this will give you what you want:
today = soup.find('div', class_='match-day')
But, if you want to explicitly specify the date, you can find the tag containing today's date, by using text='2018-05-02' as a parameter for the find() method. But, note that in the page source, the tag is <span class="standard-headline">2018-05-02</span> and not a <div> tag. After getting this tag, use .parent to get the <div class="match-day"> tag.
today = soup.find('span', text='2018-05-02').parent
Again, if you want to make the solution more generic, you can use datetime.date.today() instead of the hard-coded date.
today = soup.find('span', text=datetime.date.today()).parent
You'll have to import the datetime module for this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With