I am creating a local response cache for which I am creating a Pipeline
because I need to store information of an item depending on its ID, collected from a site.
Now I also need to create a Downloader Middleware
because depending on the ID that I previously stored, I don't want to hit the site with a new Request
, so I am intercepting the Request
before being sent to the server, check if the ID already exists on my cache, and if yes, then only return the same item from my cache.
Now as you see both Pipeline
and Middleware
need to work together, so separating doesn't seem very clean (also I have variables on both that I want to be unique), but when I setup both on their respective settings:
DOWNLOADER_MIDDLEWARES = {
'myproject.urlcache.CachePipelineMiddleware': 1,
}
ITEM_PIPELINES = {
'myproject.urlcache.CachePipelineMiddleware': 800,
}
I get two different instances (checking a log message on the constructor, so it gets created twice).
How can I make sure only one instance gets created and that I won't conflict with Pipeline
and Downloader Middleware
functionality of my project?
I just realized this is a simple Singleton
problem, and scrapy
would actually work with the same instance for Pipeline and Middleware.
I am creating this Singleton
Class first:
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
And then, on the class for the Pipeline/Middleware I am adding the following:
class CachePipelineMiddleware(object):
__metaclass__ = Singleton
def process_item(self, item, spider):
# it works as a Pipeline
def process_request(self, request, spider):
# it works as a Middleware
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With