Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enabling deltafetch in scrapy

I have worked on scrapy a bit and now I have my spider ready. But now I want my spider to scrape only those items which is not been scraped in its previous run, and scrape only the new contents. By achieving this I can reduce the runtime of my spider.

While studying about this I came across deltafetch, Which I think will serve my requirement. But I am not being able to import that feature. I would be glad if any body could guide me about using it in a well defined way.

And also if there is any other middleware which serve the similar purpose I would be interested to know.

like image 874
Ashwin Rao Avatar asked Mar 21 '23 14:03

Ashwin Rao


1 Answers

Using standard tools:

pip install scrapylib

Then add this to you project settings.py:

SPIDER_MIDDLEWARES = {
    'scrapylib.deltafetch.DeltaFetch': 100,
}

DELTAFETCH_ENABLED = True
like image 91
PauliusZ Avatar answered Mar 31 '23 14:03

PauliusZ