Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy Django Limit links crawled

I just got scrapy setup and running and it works great, but I have two (noob) questions. I should say first that I am totally new to scrapy and spidering sites.

  1. Can you limit the number of links crawled? I have a site that doesn't use pagination and just lists a lot of links (which I crawl) on their home page. I feel bad crawling all of those links when I really just need to crawl the first 10 or so.

  2. How do you run multiple spiders at once? Right now I am using the command scrapy crawl example.com, but I also have spiders for example2.com and example3.com. I would like to run all of my spiders using one command. Is this possible?

like image 418
imns Avatar asked Nov 24 '10 19:11

imns


1 Answers

for #1: Don't use rules attribute to extract links and follow, write your rule in parse function and yield or return Requests object.

for #2: Try scrapyd

like image 194
Jet Guo Avatar answered Oct 11 '22 05:10

Jet Guo