scrapy: Middleware/Pipeline single instance

Question

I am creating a local response cache for which I am creating a Pipeline because I need to store information of an item depending on its ID, collected from a site.

Now I also need to create a Downloader Middleware because depending on the ID that I previously stored, I don't want to hit the site with a new Request, so I am intercepting the Request before being sent to the server, check if the ID already exists on my cache, and if yes, then only return the same item from my cache.

Now as you see both Pipeline and Middleware need to work together, so separating doesn't seem very clean (also I have variables on both that I want to be unique), but when I setup both on their respective settings:

DOWNLOADER_MIDDLEWARES = {
    'myproject.urlcache.CachePipelineMiddleware': 1,
}

ITEM_PIPELINES = {
    'myproject.urlcache.CachePipelineMiddleware': 800,
}

I get two different instances (checking a log message on the constructor, so it gets created twice).

How can I make sure only one instance gets created and that I won't conflict with Pipeline and Downloader Middleware functionality of my project?

eLRuLL · Accepted Answer

I just realized this is a simple Singleton problem, and scrapy would actually work with the same instance for Pipeline and Middleware.

I am creating this Singleton Class first:

class Singleton(type):
    _instances = {}

    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
        return cls._instances[cls]

And then, on the class for the Pipeline/Middleware I am adding the following:

class CachePipelineMiddleware(object):

    __metaclass__ = Singleton

    def process_item(self, item, spider):
        # it works as a Pipeline

    def process_request(self, request, spider):
        # it works as a Middleware

scrapy: Middleware/Pipeline single instance

Tags:

python

scrapy

eLRuLL

Video Answer

1 Answers

eLRuLL

Recent Activity

Donate For Us

scrapy: Middleware/Pipeline single instance

Tags:

python

scrapy

eLRuLL

Video Answer

1 Answers

eLRuLL

Related questions

Recent Activity

Donate For Us