Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fix "403 Forbidden" errors with Python requests even with User-Agent headers?

I am sending a request to some URL. I copied the curl command to python. So, all the headers are included, but my request is not working and I receive status code 403 and error code 1020 in the HTML output.

The code is

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    # 'Accept-Encoding': 'gzip, deflate, br',
    'DNT': '1',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
}

response = requests.get('https://v2.gcchmc.org/book-appointment/', headers=headers)

print(response.status_code)
print(response.cookies.get_dict())
with open("test.html",'w') as f:
    f.write(response.text)

I also get cookies but not getting the desired response. I know I can do it with selenium but I want to know the reason behind this.

Note:
I have installed all the libraries and checked the versions, but it is still not working and throwing a 403 error.

like image 383
farhan jatt Avatar asked Dec 31 '25 06:12

farhan jatt


1 Answers

The site is protected by cloudflare which aims to block, among other things, unauthorized data scraping. From What is data scraping?

The process of web scraping is fairly simple, though the implementation can be complex. Web scraping occurs in 3 steps:

  1. First the piece of code used to pull the information, which we call a scraper bot, sends an HTTP GET request to a specific website.
  2. When the website responds, the scraper parses the HTML document for a specific pattern of data.
  3. Once the data is extracted, it is converted into whatever specific format the scraper bot’s author designed.

You can use urllib instead of requests, it seems to be able to deal with cloudflare

req = urllib.request.Request('https://v2.gcchmc.org/book-appointment/')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:106.0) Gecko/20100101 Firefox/106.0')
req.add_header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8')
req.add_header('Accept-Language', 'en-US,en;q=0.5')

r = urllib.request.urlopen(req).read().decode('utf-8')
with open("test.html", 'w', encoding="utf-8") as f:
    f.write(r)
like image 178
Guy Avatar answered Jan 02 '26 23:01

Guy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!