Before I build a full solution to my problem using Scrapy I am posting a simplistic version of what I want to do:
import requests
url = 'http://www.whoscored.com/stageplayerstatfeed/?field=1&isAscending=false&orderBy=Rating&playerId=-1&stageId=9155&teamId=32"'
params = {'d': date.strftime('%Y%m'), 'isAggregate': 'false'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36'}
response = requests.get(url, params=params, headers=headers)
fixtures = response.body
#fixtures = literal_eval(response.content)
print fixtures
This code is saying that the above URL does not exist. The URL relates to an XHR request that is submitted when you toggle from the 'Overall' to the 'Home' tab of the main table on this page:
http://www.whoscored.com/Teams/32/
If you activate XHR logging within the Console of Google Developer Tools you can see both the XHR request and the response sent from the server in the form of a dictionary (which is the expected format).
Can anyone tell me why the above code is not returning the data I would expect to see?
Thanks
XMLHttpRequest (XHR) objects are used to interact with servers. You can retrieve data from a URL without having to do a full page refresh. This enables a Web page to update just part of a page without disrupting what the user is doing. XMLHttpRequest is used heavily in AJAX programming.
XMLHttpRequest (XHR) is a JavaScript API to create AJAX requests. Its methods provide the ability to send network requests between the browser and a server.
You have several problems:
http://www.whoscored.com/stageplayerstatfeed
GET
parametersresponse.json()
, not response.body
The fixed version:
import requests
url = 'http://www.whoscored.com/stageplayerstatfeed'
params = {
'field': '1',
'isAscending': 'false',
'orderBy': 'Rating',
'playerId': '-1',
'stageId': '9155',
'teamId': '32'
}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Host': 'www.whoscored.com',
'Referer': 'http://www.whoscored.com/Teams/32/'}
response = requests.get(url, params=params, headers=headers)
fixtures = response.json()
print fixtures
Prints:
[
{
u'AccurateCrosses': 0,
u'AccurateLongBalls': 10,
u'AccuratePasses': 89,
u'AccurateThroughBalls': 0,
u'AerialLost': 2,
u'AerialWon': 4,
...
},
...
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With