Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

On what side is 'HTTP Error 403: request disallowed by robots.txt' generated?

I am trying out Mechanize to make some routine simpler. I have managed to bypass that error by using br.set_handle_robots(False). There are talks about how ethical it's to use it. What I wonder about is where this error is generated, on my side, or on server side? I mean does Mechanize throw the exception when it sees some robots.txt rule or does server decline the request when it detects that I use an automation tool?

like image 253
Sergei Basharov Avatar asked Mar 18 '26 01:03

Sergei Basharov


1 Answers

The server detects the user-agent. If the user agent match one in robots.txt, the rules are applied by the client. By default, mechanize returns "Python-urllib/2.7".

See http://en.wikipedia.org/wiki/Robots_exclusion_standard

like image 200
Gilles Quenot Avatar answered Mar 20 '26 13:03

Gilles Quenot



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!