<p>I'm trying to use the search form at rlsnet.ru. Here is the form's definition I've extracted from the source file: </p> <pre class="prettyprint"><code><form id="site_search_form" action="/search_result.htm" method="get"> <input id="simplesearch_text_input" class="search__field" type="text" name="word" value="" autocomplete="off"> <input type="hidden" name="path" value="/" id="path"> <input type="hidden" name="enter_clicked" value="1"> <input id="letters_id" type="hidden" name="letters" value=""> <input type="submit" class="g-btn search__btn" value="Найти" id="simplesearch_button"> <div class="sf_suggestion"> <ul style="display: none; z-index:1000; opacity:0.85;"> </ul> </div> <div id="contentsf"> </div> </form> </code></pre> <p>Here is the code I used to send the search request:</p> <pre class="prettyprint"><code>import requests from urllib.parse import urlencode root = "http://www.rlsnet.ru/search_result.htm?" response = requests.get(root + urlencode({"word": "Церебролизин".encode('cp1251')}) </code></pre> <p>Each time I do it, the response status is 403. When I enter the same request URL (i.e. <code>http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED</code>) into Safari/Chrome/Opera, it works fine and returns the expected page. What am I doing wrong? Googling the issue only brought this SO question: why url works in browser but not using requests get method, which was of little use.</p>

<p>Well that's because default <code>User-Agent</code> of <code>requests</code> is <code>python-requests/2.13.0</code>, and in your case that website don't like traffic from "non-browsers", so they try to block such traffic. </p> <pre class="prettyprint"><code>>>> import requests >>> session = requests.Session() >>> session.headers {'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'} </code></pre> <p>All you need to do is to make the request appear like coming from a browser, so just add an extra <code>header</code> parameter: </p> <pre class="prettyprint"><code>import requests headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like response = requests.get('http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED', headers=headers) print response.status_code print response.url 200 http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED </code></pre>

requests.get returns 403 while the same url works in browser

Tags:

python

python-3.x

unicode

python-requests

I'm trying to use the search form at rlsnet.ru. Here is the form's definition I've extracted from the source file:

<form id="site_search_form" action="/search_result.htm" method="get">
    <input id="simplesearch_text_input" class="search__field" type="text" name="word" value="" autocomplete="off">
    <input type="hidden" name="path" value="/" id="path">
    <input type="hidden" name="enter_clicked" value="1">
    <input id="letters_id" type="hidden" name="letters" value="">
    <input type="submit" class="g-btn search__btn" value="Найти" id="simplesearch_button">
    <div class="sf_suggestion">
        <ul style="display: none; z-index:1000; opacity:0.85;">
        </ul>
    </div>
    <div id="contentsf">
    </div>
</form>

Here is the code I used to send the search request:

import requests
from urllib.parse import urlencode 

root = "http://www.rlsnet.ru/search_result.htm?"
response = requests.get(root + urlencode({"word": "Церебролизин".encode('cp1251')})

Each time I do it, the response status is 403. When I enter the same request URL (i.e. http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED) into Safari/Chrome/Opera, it works fine and returns the expected page. What am I doing wrong? Googling the issue only brought this SO question: why url works in browser but not using requests get method, which was of little use.

258

asked Jan 30 '17 22:01

Eli Korvigo

1 Answers

Well that's because default User-Agent of requests is python-requests/2.13.0, and in your case that website don't like traffic from "non-browsers", so they try to block such traffic.

>>> import requests
>>> session = requests.Session()
>>> session.headers
{'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'User-Agent': 'python-requests/2.13.0'}

All you need to do is to make the request appear like coming from a browser, so just add an extra header parameter:

import requests

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} # This is chrome, you can set whatever browser you like
response = requests.get('http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED', headers=headers)

print response.status_code
print response.url

200 
http://www.rlsnet.ru/search_result.htm?word=%D6%E5%F0%E5%E1%F0%EE%EB%E8%E7%E8%ED

172

answered Sep 29 '22 10:09

Shane

Related questions
                            
                                Python BeautifulSoup findAll by "class" attribute
                            
                                SqlAlchemy update not working with Sqlite
                            
                                Python sklearn - how to calculate p-values
                            
                                How to enable python repl autocomplete and still allow new line tabs
                            
                                How to store a Python dictionary as an Environment Variable
                            
                                How to return data with 403 error in Django Rest Framework?
                            
                                subprocess call ffmpeg (command line)
                            
                                Where is Qt designer app on Mac + Anaconda?
                            
                                Count how many times each row is present in numpy.array
                            
                                How to get one number specific times in an array python
                            
                                Multiple threads writing to the same CSV in Python
                            
                                How to sort an array of objects by datetime in Python? [duplicate]
                            
                                Call another function and optionally keep default arguments
                            
                                How to round dates to week starts in Pandas
                            
                                Python "ValueError: incomplete format" upon print("stuff %" % "thingy")
                            
                                Ensure the gensim generate the same Word2Vec model for different runs on the same data
                            
                                Find local maximums in numpy array
                            
                                pandas.Series() Creation using DataFrame Columns returns NaN Data entries
                            
                                Filling dict with NA values to allow conversion to pandas dataframe
                            
                                When am I supposed to use del in python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With