If I get a 500 internal server error in Scrapy, how do I skip the URL?

Question

I'm scraping data off several thousand pages with the general URL of:

http://example.com/database/?id=(some number)

where I am running through the id numbers.

I keep encountering huge chunks of URLs that generate a 500 internal server error, and scrapy goes over these chunks several times for some reason. This eats up a lot of time, so I am wondering if there is a way to just move onto the next URL immediately and not have scrapy send requests several times.

paul trmbrth · Accepted Answer

The component retrying 500 errors is RetryMiddleware.

If you do not want Scrapy to retry requests that received 500 status code, in your settings.py you can set RETRY_HTTP_CODES to not include 500 (default is [500, 502, 503, 504, 400, 408]), or disable the RetryMiddleware altogether with RETRY_ENABLED = False

See RetryMiddleware settings for more.

If I get a 500 internal server error in Scrapy, how do I skip the URL?

Tags:

python

http-status-codes

scrapy

galilei

1 Answers

paul trmbrth

Recent Activity

Donate For Us

If I get a 500 internal server error in Scrapy, how do I skip the URL?

Tags:

python

http-status-codes

scrapy

galilei

1 Answers

paul trmbrth

Related questions

Recent Activity

Donate For Us