Unable to use multiple proxies within Scrapy spider

Question

I've written a script in python using Scrapy to send a request to a webpage through proxy without changing anything in the settings.py or DOWNLOADER_MIDDLEWARES. It is working great now. However, the only thing I can't make use of is creating a list of proxies so that If one fails another will be in use. How can I twitch this portion os.environ["http_proxy"] = "http://176.58.125.65:80" to get list of proxies one by one as it supports only one. Any help on this will be highly appreciated.

This is what I've tried so far (working one):

import scrapy, os
from scrapy.crawler import CrawlerProcess

class ProxyCheckerSpider(scrapy.Spider):
    name = 'lagado'
    start_urls = ['http://www.lagado.com/proxy-test']
    os.environ["http_proxy"] = "http://176.58.125.65:80" #can't modify this portion to get list of proxies

    def parse(self, response):
        stat = response.css(".main-panel p::text").extract()[1:3]
        yield {"Proxy-Status":stat}

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',

})
c.crawl(ProxyCheckerSpider)
c.start()

I do not want to change anything in the settings.py or create any custom middleware to serve the purpose. I wish to achieve the same (externally) like I did above with a single proxy. Thanks.

Granitosaurus · Accepted Answer

You can also set the meta key proxy per-request, to a value like http://some_proxy_server:port or http://username:password@some_proxy_server:port.

from official docs: https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware

So you need to write your own middleware that would do:

Catch failed responses
If response failed because of proxy:
1. replace request.meta['proxy'] value with new proxy ip
2. reschedule request

Alternative you can look into scrapy extensions packages that are already made to solve this: https://github.com/TeamHG-Memex/scrapy-rotating-proxies

Unable to use multiple proxies within Scrapy spider

Tags:

python

python-3.x

web-scraping

scrapy

scrapy-spider

SIM

1 Answers

Granitosaurus

Recent Activity

Donate For Us

Unable to use multiple proxies within Scrapy spider

Tags:

python

python-3.x

web-scraping

scrapy

scrapy-spider

SIM

1 Answers

Granitosaurus

Related questions

Recent Activity

Donate For Us