I want to run my spider from a script rather than a scrap crawl
I found this page
http://doc.scrapy.org/en/latest/topics/practices.html
but actually it doesn't say where to put that script.
any help please?
The key to running scrapy in a python script is the CrawlerProcess class. This is a class of the Crawler module. It provides the engine to run scrapy within a python script. Within the CrawlerProcess class code, python's twisted framework is imported.
To execute the program, select Run -> Run (or press F5), and confirm the Run settings if required. If so, then you have just run your first Python program - well done.
We use the CrawlerProcess class to run multiple Scrapy spiders in a process simultaneously. We need to create an instance of CrawlerProcess with the project settings. We need to create an instance of Crawler for the spider if we want to have custom settings for the Spider.
It is simple and straightforward :)
Just check the official documentation. I would make there a little change so you could control the spider to run only when you do python myscript.py
and not every time you just import from it. Just add an if __name__ == "__main__"
:
import scrapy from scrapy.crawler import CrawlerProcess class MySpider(scrapy.Spider): # Your spider definition pass if __name__ == "__main__": process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' }) process.crawl(MySpider) process.start() # the script will block here until the crawling is finished
Now save the file as myscript.py
and run 'python myscript.py`.
Enjoy!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With