I am creating a local response cache for which I am creating a Pipeline because I need to store information of an item depending on its ID, collected from a site.
Now I also need to create a Downloader Middleware because depending on the ID that I previously stored, I don't want to hit the site with a new Request, so I am intercepting the Request before being sent to the server, check if the ID already exists on my cache, and if yes, then only return the same item from my cache.
Now as you see both Pipeline and Middleware need to work together, so separating doesn't seem very clean (also I have variables on both that I want to be unique), but when I setup both on their respective settings:
DOWNLOADER_MIDDLEWARES = {
'myproject.urlcache.CachePipelineMiddleware': 1,
}
ITEM_PIPELINES = {
'myproject.urlcache.CachePipelineMiddleware': 800,
}
I get two different instances (checking a log message on the constructor, so it gets created twice).
How can I make sure only one instance gets created and that I won't conflict with Pipeline and Downloader Middleware functionality of my project?
I just realized this is a simple Singleton problem, and scrapy would actually work with the same instance for Pipeline and Middleware.
I am creating this Singleton Class first:
class Singleton(type):
_instances = {}
def __call__(cls, *args, **kwargs):
if cls not in cls._instances:
cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
return cls._instances[cls]
And then, on the class for the Pipeline/Middleware I am adding the following:
class CachePipelineMiddleware(object):
__metaclass__ = Singleton
def process_item(self, item, spider):
# it works as a Pipeline
def process_request(self, request, spider):
# it works as a Middleware
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With