Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

requests.get returns 403 while the same url works in browser

I'm trying to use the search form at rlsnet.ru. Here is the form's definition I've extracted from the source file:

<form id="site_search_form" action="/search_result.htm" method="get">
    <input id="simplesearch_text_input" class="search__field" type="text" name="word" value="" autocomplete="off">
    <input type="hidden" name="path" value="/" id="path">
    <input type="hidden" name="enter_clicked" value="1">
    <input id="letters_id" type="hidden" name="letters" value="">
    <input type="submit" class="g-btn search__btn" value="Найти" id="simplesearch_button">
    <div class="sf_suggestion">
        <ul style="display: none; z-index:1000; opacity:0.85;">
        </ul>
    </div>
    <div id="contentsf">
    </div>
</form>

Here is the code I used to send the search request:

import requests
from urllib.parse import urlencode 

root = "http://www.rlsnet.ru/search_result.htm?"
response = requests.get(root + urlencode({"word": "Церебролизин".encode('cp1251')})

Each time I do it, the response status is 403. When I enter the same request URL (i.e. http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED) into Safari/Chrome/Opera, it works fine and returns the expected page. What am I doing wrong? Googling the issue only brought this SO question: why url works in browser but not using requests get method, which was of little use.

like image 258
Eli Korvigo Avatar asked Jan 30 '17 22:01

Eli Korvigo


People also ask

How do I fix a 403 error in Python?

If you still get a 403 Forbidden after adding a user-agent , you may need to add more headers, such as referer : headers = { 'User-Agent': '...', 'referer': 'https://...' } The headers can be found in the Network > Headers > Request Headers of the Developer Tools. (Press F12 to toggle it.)

What does requests get URL do?

Definition and Usage. The get() method sends a GET request to the specified url.

When should HTTP 403 be used?

The HTTP 403 Forbidden response status code indicates that the server understands the request but refuses to authorize it.


1 Answers

Well that's because default User-Agent of requests is python-requests/2.13.0, and in your case that website don't like traffic from "non-browsers", so they try to block such traffic.

>>> import requests
>>> session = requests.Session()
>>> session.headers
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}

All you need to do is to make the request appear like coming from a browser, so just add an extra header parameter:

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
response = requests.get('http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED', headers=headers)

print response.status_code
print response.url

200 
http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED
like image 172
Shane Avatar answered Sep 29 '22 10:09

Shane