I am trying to crawl a forum website with scrapy. The crawler works fine if I have
CONCURRENT_REQUESTS = 1
But if I increase that number then I get this error
2012-12-21 05:04:36+0800 [working] DEBUG: Retrying http://www.example.com/profile.php?id=1580> (failed 1 times): 503 Service Unavailable
I want to know if the forum is blocking the request or there is some settings problem.
HTTP status code 503, "Service Unavailable", means that (for some reason) the server wasn't able to process your request. It's usually a transient error. I you want to know if you have been blocked, just try again in a little while and see what happens.
It could also mean that you're fetching pages too quickly. The fix is not to do this by keeping concurrent requests at 1 (and possibly adding a delay). Be polite.
And you will encounter various errors if you are scraping a enough. Just make sure that your crawler can handle them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With