Hi i am working on scrapy, i created a scrapy folder with scrapy startproject example
and written spider to scrape all the data from the url, and
I had run the spider using the command scrapy crawl spider_name
, its working fine and able to fetch data.
But i had a requirement that i need to run the scrapy with a single spider file created i mean a single py file something like
python -u /path/to/spider_file_inside_scrapy_folder_created.py
Is it possible to run a spider without scrapy crawl
command after creating a scrapy project folder with spider.py file
Yes! If you want to do it programmatically instead of invoking the command via Popen, you can run it as follows:
>>> from scrapy.cmdline import execute
>>> execute(['scrapy','crawl','dmoz'])
Let me know if you have any trouble. I'm used the version that the scrapy docs refer to on Github for testing purposes:
https://github.com/scrapy/dirbot
Try the runspider command:
scrapy runspider /path/to/spider_file_inside_scrapy_folder_created.py
I think the answer (if I understand your question) is now to use the API:
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
...
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start()
Yes you can, first reach to the destination where the xyz.py file is located through command prompt. Then you can write the command :
scrapy runspider xyz.py
And if you want to save the output, you can write :
scrapy runspider xyz.py -o output.csv
Or you can save the output in json also
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With