How to use Downloader Middleware in Scrapy

Question

I am using scrapy to scrape some web pages. I wrote my customised ProxyMiddleware class in which I implemented my requirement in process_request(self,request,spider) method. Here is my code(copied):

class ProxyMiddleware(scrapy.downloadermiddlewares.httpproxy):
def __init__(self, proxy_ip=''):
    self.proxy_ip = proxy_ip

def process_request(self,request,spider):
    ip = random.choice(self.proxy_list)
    if ip:
        request.meta['proxy'] = ip
    return request

proxy_list = [list of proxies]

Now, I didn't understand how scrapy will consider my implementation instead of default class. After some searching and brainstorming, what I understood is, I need to make changes in settings.py

DOWNLOADER_MIDDLEWARES = {
    'IPProxy.middlewares.MyCustomDownloaderMiddleware': 543,
    'IPProxy.IPProxy.spiders.RandomProxy': 600
}

For better understanding of my project structure to readers, I added second element in the list with some random value. My project structure is:

enter image description here

My question is,

How to use DOWNLOADER_MIDDLEWARES in settings.py correctly
How to assign the values to the elements in DOWNLOADER_MIDDLEWARES
How to make scrapy to call my customized code instead of the default

alecxe · Accepted Answer

If you want to disable the, assuming, built-in HttpProxyMiddleware Downloader Middleware - set its value in DOWNLOADER_MIDDLEWARES to None:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': None,
    'IPProxy.middlewares.MyCustomDownloaderMiddleware': 543,
    'IPProxy.IPProxy.spiders.RandomProxy': 600
}

How to use Downloader Middleware in Scrapy

Tags:

python

python-2.7

scrapy

scrapy-spider

Jack Daniel

1 Answers

alecxe

Recent Activity

Donate For Us

How to use Downloader Middleware in Scrapy

Tags:

python

python-2.7

scrapy

scrapy-spider

Jack Daniel

1 Answers

alecxe

Related questions

Recent Activity

Donate For Us