Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Multiple spiders in scrapy

  1. In scrapy for example if i had two URL's that contains different HTML. Now i want to write two individual spiders each for one and want to run both the spiders at once. In scrapy is it possible to run multiple spiders at once.

  2. In scrapy after writing multiple spiders, how can we schedule them to run for every 6 hours(May be like cron jobs)

I had no idea of above , can u suggest me how to perform the above things with an example.

Thanks in advance.

like image 832
Shiva Krishna Bavandla Avatar asked Jun 08 '12 05:06

Shiva Krishna Bavandla


2 Answers

It would probably be easiest to just run two scrapy scripts at once from the OS level. They should both be able to save to the same database. Create a shell script to call both scrapy scripts to do them at the same time:

scrapy runspider foo &
scrapy runspider bar

Be sure to make this script executable with chmod +x script_name

To schedule a cronjob every 6 hours, type crontab -e into your terminal, and edit the file as follows:

* */6 * * * path/to/shell/script_name >> path/to/file.log

The first * is minutes, then hours, etc., and an asterik is a wildcard. So this says run the script at any time where the hours is divisible by 6, or every six hours.

like image 74
foxyNinja7 Avatar answered Sep 20 '22 15:09

foxyNinja7


You can try using CrawlerProcess

from scrapy.utils.project import get_project_settings
from scrapy.crawler import CrawlerProcess

from myproject.spiders import spider1, spider2

1Spider = spider1.1Spider()
2Spider = spider2.2Spider()
process = CrawlerProcess(get_project_settings())
process.crawl(1Spider)
process.crawl(2Spider)
process.start()

If you want to see the full log of the crawl, set LOG_FILE in your settings.py.

LOG_FILE = "logs/mylog.log"
like image 40
Aminah Nuraini Avatar answered Sep 17 '22 15:09

Aminah Nuraini