404 status code while making HTTP request via Python's "requests" library. However page is loading fine in browser

Question

I am trying to web scrape the content of few of the websites. But I noticed that for some of the websites I am getting the response with status code as 200. However, for some other of them I am getting 404 status code with the response. But when I am opening these websites (returning 404) in the browser, it is loading fine for me. What am I missing here?

For example:

import requests

url_1 = "https://www.transfermarkt.com/jumplist/startseite/wettbewerb/GB1"
url_2 = "https://stackoverflow.com/questions/36516183/what-should-i-use-instead-of-urlopen-in-urllib3"

page_t = requests.get(url_2)
print(page_t.status_code)      #Getting a Not Found page and  404 status

page = requests.get(url_1)
print(page.status_code)       #Getting a Valid HTML page and 200 status

Moinuddin Quadri · Accepted Answer

The website you mentioned is checking for "User-Agent" in the request's header. You can fake the "User-Agent" in your request by passing the dict object with Custom Headers in your requests.get(..) call. It'll make it look like it is coming from the actual browser and you'll receive the response.

For example:

>>> import requests
>>> url = "https://www.transfermarkt.com/jumplist/startseite/wettbewerb/GB1"
>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}

# Make request with "User-Agent" Header
>>> response = requests.get(url, headers=headers)
>>> response.status_code
200   # success response

>>> response.text  # will return the website content

Nishant Nischal Chintalapati · Answer

Some websites do not allow scraping. So you need to provide a header with user-agent specifying type of browser and the system which says it is a browser request and not some code trying to scrape

use this in your code

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}

response = requests.get(url, headers=headers)`

See if this helps

404 status code while making HTTP request via Python's "requests" library. However page is loading fine in browser

Tags:

python

python-3.x

python-requests

web-scraping

Paul Vannan

2 Answers

Moinuddin Quadri

Nishant Nischal Chintalapati

Recent Activity

Donate For Us

404 status code while making HTTP request via Python's "requests" library. However page is loading fine in browser

Tags:

python

python-3.x

python-requests

web-scraping

Paul Vannan

2 Answers

Moinuddin Quadri

Nishant Nischal Chintalapati

Related questions

Recent Activity

Donate For Us