I've written a script in python using Scrapy
to send a request to a webpage through proxy without changing anything in the settings.py
or DOWNLOADER_MIDDLEWARES
. It is working great now. However, the only thing I can't make use of is creating a list of proxies so that If one fails another will be in use. How can I twitch this portion os.environ["http_proxy"] = "http://176.58.125.65:80"
to get list of proxies one by one as it supports only one. Any help on this will be highly appreciated.
This is what I've tried so far (working one):
import scrapy, os
from scrapy.crawler import CrawlerProcess
class ProxyCheckerSpider(scrapy.Spider):
name = 'lagado'
start_urls = ['http://www.lagado.com/proxy-test']
os.environ["http_proxy"] = "http://176.58.125.65:80" #can't modify this portion to get list of proxies
def parse(self, response):
stat = response.css(".main-panel p::text").extract()[1:3]
yield {"Proxy-Status":stat}
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(ProxyCheckerSpider)
c.start()
I do not want to change anything in the settings.py
or create any custom middleware
to serve the purpose. I wish to achieve the same (externally) like I did above with a single proxy. Thanks.
You can also set the meta key proxy per-request, to a value like http://some_proxy_server:port or http://username:password@some_proxy_server:port.
from official docs: https://doc.scrapy.org/en/latest/topics/downloader-middleware.html#scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware
So you need to write your own middleware that would do:
request.meta['proxy']
value with new proxy ipAlternative you can look into scrapy extensions packages that are already made to solve this: https://github.com/TeamHG-Memex/scrapy-rotating-proxies
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With