I am trying out Mechanize to make some routine simpler. I have managed to bypass that error by using br.set_handle_robots(False). There are talks about how ethical it's to use it. What I wonder about is where this error is generated, on my side, or on server side? I mean does Mechanize throw the exception when it sees some robots.txt rule or does server decline the request when it detects that I use an automation tool?
The server detects the user-agent. If the user agent match one in robots.txt, the rules are applied by the client. By default, mechanize returns "Python-urllib/2.7".
See http://en.wikipedia.org/wiki/Robots_exclusion_standard
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With