On what side is 'HTTP Error 403: request disallowed by robots.txt' generated?

Question

I am trying out Mechanize to make some routine simpler. I have managed to bypass that error by using br.set_handle_robots(False). There are talks about how ethical it's to use it. What I wonder about is where this error is generated, on my side, or on server side? I mean does Mechanize throw the exception when it sees some robots.txt rule or does server decline the request when it detects that I use an automation tool?

Gilles Quenot · Accepted Answer

The server detects the user-agent. If the user agent match one in robots.txt, the rules are applied by the client. By default, mechanize returns "Python-urllib/2.7".

See http://en.wikipedia.org/wiki/Robots_exclusion_standard

On what side is 'HTTP Error 403: request disallowed by robots.txt' generated?

Tags:

python

mechanize

Sergei Basharov

1 Answers

Gilles Quenot

Recent Activity

Donate For Us

On what side is 'HTTP Error 403: request disallowed by robots.txt' generated?

Tags:

python

mechanize

Sergei Basharov

1 Answers

Gilles Quenot

Related questions

Recent Activity

Donate For Us