Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python get raises HTTPError 400 Client Error, but after manually accessing URL, get works temporarily

When I run this code in iPython (Python 2.7):

from requests import get
_get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'})
print _get.url
_get.raise_for_status()
_get.json()

I am getting:

http://stats.nba.com/stats/playergamelog?PlayerID=203083&Season=2015-16&SeasonType=Regular+Season
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-5-8f8343b2c4cd> in <module>()
      1 _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203083, 'Season':'2015-16', 'SeasonType':'Regular Season'})
      2 print _get.url
----> 3 _get.raise_for_status()
      4 _get.json()

/Library/Python/2.7/site-packages/requests/models.pyc in raise_for_status(self)
    849 
    850         if http_error_msg:
--> 851             raise HTTPError(http_error_msg, response=self)
    852 
    853     def close(self):

HTTPError: 400 Client Error: Bad Request

However, if I go to the url in my browser, it works. Then, when I come back to the code and run it again after manually visiting the URL in my browser (Chrome which iPython is running in), the code runs with no error. However, it may go back to raising the error in sequential executions.

This code has worked for me hundreds if not thousands of times with no issue. How do I fix this error?

Thanks.

like image 916
andingo Avatar asked Oct 18 '22 17:10

andingo


1 Answers

HTTPError: 400 Client Error: Bad Request means the request you made has error. And I think the server may check some headers in the HTTP request, for example the user-agent.

So I tried setting the User-Agent header to mimic Firefox:

# No User-Agent
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'})
>>> _get.raise_for_status()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\requests\models.py", line 840, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://stats.nba.com/stats/playergamelog?PlayerID=203082&Season=2015-16&SeasonType=Regular+Season

# This time, set user-agent to mimic a desktop browser
>>> headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'}
>>> _get = get('http://stats.nba.com/stats/playergamelog', params={'PlayerID': 203082, 'Season':'2015-16', 'SeasonType':'Regular Season'}, headers=headers)
>>> _get.raise_for_status()
>>>
# no error

The reason it can work after you visiting the URL in browser is caching.

According to Alastair McCormack, stats.nba.com is fronted by Akamai CDN, so the caching is probably happening at the edge, "varied" by the query string/URI rather than extranous headers. Once a valid response has been made for that URI, it is cached by the CDN edge node serving that client.

So when you run code after visited url in browser, CDN will return you the cached response. no 400 will be raised in such situation.

like image 87
realli Avatar answered Nov 03 '22 07:11

realli