What is the difference between yield
and return
explain with example?
and what actually happens when in the generator we yield
any value or request?
I'm not calling my generator from any other function or program.
My loop is:
for index in range(3):
yield Request(url,callback=parse)
This is making requests on the specific url and calling the callback function after the request. What this code is doing?
And what is the sequence followed by the code?
I guess you are faced with the puzzle in the function start_requests()
with the context yield
in it.
For example:
def start_requests(self):
urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
When you refer to the document of scrapy spider and then find the function named start_requests()
,it says the method must return an iterable. If you change yield to return, it is not an iterable because the for loop is already over when you start your spider.It could be a mess.
It is natural that your spider should send http requests to these destinations one by one so the best way is a generator. In the for loop, your spider will stop at yield
and return scrapy.Request()
, with all things done, your spider will send()
to generator and move on to next
following urls in the list.
The only aspect of your question that isn't answered by the question linked to by @Jochen is "i am not calling my generator from any other function or program.".
You define your crawler class, and scrapy calls the (special) functions you define, as specified in the documentation. (For example the parse
function is the default callback for requests that don't specify a callback).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With