I'm learning web scraping and I've been trying to write a program that extracts information from Steam's website as an exercise.
I want to write a program that just visits the page of each top 10 best selling game and extracts something, but my program just gets redirected to the age check page when it tries to visit M rated games.
My program looks something like this:
front_page = urlopen('http://store.steampowered.com/').read()
bs = BeautifulSoup(front_page, 'html.parser')
top_sellers = bs.select('#tab_topsellers_content a.tab_item_overlay')
for item in top_sellers:
game_page = urlopen(item.get('href'))
bs = BeautifulSoup(game_page.read(), 'html.parser')
#Now I'm on the age check page :(
I don't know how to get past the age check, I've tried filling out the age check by sending a POST request to it like this:
post_params = urlencode({'ageDay': '1', 'ageMonth': 'January', 'ageYear': '1988', 'snr': '1_agecheck_agecheck__age-gate'}).encode('utf-8')
page = urlopen(agecheckurl, post_params)
But it dosn't work, I'm still on the age check page. Anyone that can help me out here, how can I get past it?
Okay, seems like Steam use cookies to save the age check result. It's using birthtime.
Since I don't know how to set cookies use urllib, here is an example using requests:
import requests
cookies = {'birthtime': '568022401'}
r = requests.get('http://store.steampowered.com/', cookies=cookies)
Now there is no age check.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With