Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to run a scrapy with a py file

Tags:

python

scrapy

Hi i am working on scrapy, i created a scrapy folder with scrapy startproject example and written spider to scrape all the data from the url, and I had run the spider using the command scrapy crawl spider_name, its working fine and able to fetch data.

But i had a requirement that i need to run the scrapy with a single spider file created i mean a single py file something like

python -u /path/to/spider_file_inside_scrapy_folder_created.py

Is it possible to run a spider without scrapy crawl command after creating a scrapy project folder with spider.py file

like image 900
Shiva Krishna Bavandla Avatar asked Sep 29 '12 04:09

Shiva Krishna Bavandla


4 Answers

Yes! If you want to do it programmatically instead of invoking the command via Popen, you can run it as follows:

>>> from scrapy.cmdline import execute
>>> execute(['scrapy','crawl','dmoz'])

Let me know if you have any trouble. I'm used the version that the scrapy docs refer to on Github for testing purposes:

https://github.com/scrapy/dirbot

like image 61
damzam Avatar answered Oct 09 '22 17:10

damzam


Try the runspider command:

scrapy runspider /path/to/spider_file_inside_scrapy_folder_created.py
like image 33
Steven Almeroth Avatar answered Oct 09 '22 17:10

Steven Almeroth


I think the answer (if I understand your question) is now to use the API:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()
like image 28
mikebridge Avatar answered Oct 09 '22 15:10

mikebridge


Yes you can, first reach to the destination where the xyz.py file is located through command prompt. Then you can write the command :

scrapy runspider xyz.py

And if you want to save the output, you can write :

scrapy runspider xyz.py -o output.csv

Or you can save the output in json also

like image 31
Ashish Kapil Avatar answered Oct 09 '22 17:10

Ashish Kapil