I have found lots of Scrapy
tutorials (such as this good tutorial) that all need the steps listed below. The result is a project, with lots of files (project.cfg
+ some .py
files + a specific folder structure).
How to make the steps (listed below) work as a self-contained python file that can be run with python mycrawler.py
?
(instead of a full project with lots of files, some .cfg files, etc., and having to use scrapy crawl myproject -o myproject.json
... by the way, it seems that scrapy
is a new shell command? is this true?)
Note: here could be an answer to this question but unfortunately it is deprecated and no longer works.
1) Create a new scrapy project with scrapy startproject myproject
2) Define the data structure with Item
like this:
from scrapy.item import Item, Field
class MyItem(Item):
title = Field()
link = Field()
...
3) Define the crawler with
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
class MySpider(BaseSpider):
name = "myproject"
allowed_domains = ["example.com"]
start_urls = ["http://www.example.com"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
...
4) Run with:
scrapy crawl myproject -o myproject.json
You can run scrapy spiders as a single script without starting a project by using runspider
Is this what you wanted?
#myscript.py
from scrapy.item import Item, Field
from scrapy import Spider
class MyItem(Item):
title = Field()
link = Field()
class MySpider(Spider):
start_urls = ['http://www.example.com']
name = 'samplespider'
def parse(self, response):
item = MyItem()
item['title'] = response.xpath('//h1/text()').extract()
item['link'] = response.url
yield item
Now you can run this with scrapy runspider myscript.py -o out.json
Scrapy is not unix command it just executable like python,javac,gcc etc.
bcz u are using framework for this you have to use command given provided by
framework.
one thing you can do is create a bash script and simply execute whenever you need or execute it from some other program program.
you can write crawler using urllib3 its simple
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With