Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scrapy run spider from script

Tags:

I want to run my spider from a script rather than a scrap crawl

I found this page

http://doc.scrapy.org/en/latest/topics/practices.html

but actually it doesn't say where to put that script.

any help please?

like image 962
Marco Dinatsoli Avatar asked Feb 09 '14 17:02

Marco Dinatsoli


People also ask

How do you run a Scrapy spider from a Python script?

The key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.

How do you run a Python spider?

To execute the program, select Run -> Run (or press F5), and confirm the Run settings if required. If so, then you have just run your first Python program - well done.

How do you run multiple spiders in a Scrapy?

We use the CrawlerProcess class to run multiple Scrapy spiders in a process simultaneously. We need to create an instance of CrawlerProcess with the project settings. We need to create an instance of Crawler for the spider if we want to have custom settings for the Spider.


1 Answers

It is simple and straightforward :)

Just check the official documentation. I would make there a little change so you could control the spider to run only when you do python myscript.py and not every time you just import from it. Just add an if __name__ == "__main__":

import scrapy from scrapy.crawler import CrawlerProcess  class MySpider(scrapy.Spider):     # Your spider definition     pass  if __name__ == "__main__":     process = CrawlerProcess({         'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'     })      process.crawl(MySpider)     process.start() # the script will block here until the crawling is finished 

Now save the file as myscript.py and run 'python myscript.py`.

Enjoy!

like image 151
Almog Cohen Avatar answered Oct 14 '22 14:10

Almog Cohen