How do you utilize proxy support with the python web-scraping framework Scrapy?

From the Scrapy FAQ, <blockquote> <h3>Does Scrapy work with HTTP proxies?</h3> Yes. Support for HTTP proxies is provided (since Scrapy 0.8) through the HTTP Proxy downloader middleware. See <code>HttpProxyMiddleware</code>. </blockquote> The easiest way to use a proxy is to set the environment variable <code>http_proxy</code>. How this is done depends on your shell. <pre class="prettyprint"> C:\>set http_proxy=http://proxy:port csh% setenv http_proxy http://proxy:port sh$ export http_proxy=http://proxy:port </pre> if you want to use https proxy and visited https web,to set the environment variable <code>http_proxy</code> you should follow below, <pre class="prettyprint"> C:\>set https_proxy=https://proxy:port csh% setenv https_proxy https://proxy:port sh$ export https_proxy=https://proxy:port </pre>

Scrapy and proxies

2 Answers

From the Scrapy FAQ,

Does Scrapy work with HTTP proxies?

Yes. Support for HTTP proxies is provided (since Scrapy 0.8) through the HTTP Proxy downloader middleware. See HttpProxyMiddleware.

The easiest way to use a proxy is to set the environment variable http_proxy. How this is done depends on your shell.

 C:\>set http_proxy=http://proxy:port csh% setenv http_proxy http://proxy:port sh$ export http_proxy=http://proxy:port

if you want to use https proxy and visited https web,to set the environment variable http_proxy you should follow below,

 C:\>set https_proxy=https://proxy:port csh% setenv https_proxy https://proxy:port sh$ export https_proxy=https://proxy:port

155

answered Oct 22 '22 01:10

ephemient

Single Proxy

Enable HttpProxyMiddleware in your settings.py, like this:

DOWNLOADER_MIDDLEWARES = {     'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 1 }

pass proxy to request via request.meta:

request = Request(url="http://example.com") request.meta['proxy'] = "host:port" yield request

You also can choose a proxy address randomly if you have an address pool. Like this:

Multiple Proxies

class MySpider(BaseSpider):     name = "my_spider"     def __init__(self, *args, **kwargs):         super(MySpider, self).__init__(*args, **kwargs)         self.proxy_pool = ['proxy_address1', 'proxy_address2', ..., 'proxy_addressN']      def parse(self, response):         ...parse code...         if something:             yield self.get_request(url)      def get_request(self, url):         req = Request(url=url)         if self.proxy_pool:             req.meta['proxy'] = random.choice(self.proxy_pool)         return req

answered Oct 22 '22 03:10

Amom

Related questions
                            
                                UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 35: invalid start byte
                            
                                Remove the newline character in a list read from a file [duplicate]
                            
                                Kivy: How to change window size?
                            
                                List of objects to JSON with Python
                            
                                One Hot Encoding using numpy [duplicate]
                            
                                How to reverse order of keys in python dict?
                            
                                Variance Inflation Factor in Python
                            
                                How to find whether a number belongs to a particular range in Python? [duplicate]
                            
                                What does __contains__ do, what can call __contains__ function
                            
                                How to remove numbers from string terms in a pandas dataframe
                            
                                Problems using nose in a virtualenv
                            
                                getting the row and column numbers from coordinate value in openpyxl
                            
                                Detect strings with non English characters in Python
                            
                                Python: Scaling numbers column by column with pandas
                            
                                Python: Is there an equivalent of mid, right, and left from BASIC?
                            
                                Check if a parameter is a Python module?
                            
                                PermissionError: [Errno 13] Permission denied
                            
                                Is there a method that tells my program to quit?
                            
                                How do chained assignments work?
                            
                                In Python, how do I decode GZIP encoding?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrapy and proxies

Tags:

python

scrapy

no1

People also ask

2 Answers

Does Scrapy work with HTTP proxies?

ephemient

Amom

Recent Activity

Donate For Us