python scrapy parse() function, where is the return value returned to?

Tags:

I am new on Scrapy, and I am sorry if this question is trivial. I have read the document on Scrapy from official webpage. And while I look through the document, I met this example:

import scrapy
from myproject.items import MyItem

class MySpider(scrapy.Spider):
  name = ’example.com’
  allowed_domains = [’example.com’]
  start_urls = [
  ’http://www.example.com/1.html’,
  ’http://www.example.com/2.html’,
  ’http://www.example.com/3.html’,
  ]

  def parse(self, response):
    for h3 in response.xpath(’//h3’).extract():
      yield MyItem(title=h3)
    for url in response.xpath(’//a/@href’).extract():
      yield scrapy.Request(url, callback=self.parse)

I know, the parse method must return an item or/and request, but where are these return values returned to?

One is an item and the other is request, I think these two type would be handled differently and in the case of CrawlSpider, it has Rule with callback. What about this callback's return value? where to ? same as parse()?

I am very confused on Scrapy procedure, even i read the document....

385

asked Oct 04 '14 18:10

SangminKim

1 Answers

According to the documentation:

The parse() method is in charge of processing the response and returning scraped data (as Item objects) and more URLs to follow (as Request objects).

In other words, returned/yielded items and requests are handled differently, items are being handed to the item pipelines and item exporters, but requests are being put into the Scheduler which pipes the requests to the Downloader for making a request and returning a response. Then, the engine receives the response and gives it to the spider for processing (to the callback method).

The whole data-flow process is described in the Architecture Overview page in a very detailed manner.

Hope that helps.

168

answered Sep 22 '22 04:09

alecxe

Related questions
                            
                                NLTK - No module named corpus
                            
                                How do I print out just the word itself in a WordNet synset using Python NLTK?
                            
                                Finding permutation of a set of 0 and 1, given index with O(N)
                            
                                How to switch axes (dimensions) in Julia for n-dimensional array
                            
                                Django OneToOneField, ManyToManyField, Foreign Key
                            
                                find and update a value of a dictionary in list of dictionaries
                            
                                Extracting text from script tag using BeautifulSoup in Python
                            
                                Iterating over first d axes of numpy array
                            
                                Calculating time difference between two rows
                            
                                Flask THREADS_PER_PAGE configuration
                            
                                value error when using numpy.savetxt
                            
                                python if statement too long and ugly, is there a way to shorten it [duplicate]
                            
                                Force setup.py to use my custom compiler
                            
                                Get last value of an OrderedDict in Python3
                            
                                AttributeError: 'str' object has no attribute 'fields' Using Django non rel on GAE
                            
                                Bringing a classifier to production
                            
                                Scrapy/Python/XPath - How to extract data from within data?
                            
                                Sqlalchemy Foreign key relationship error while creating tables
                            
                                Unable to pass an lxml etree object to a separate process
                            
                                3D tiling of a numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python scrapy parse() function, where is the return value returned to?

Tags:

python

web-scraping

scrapy

scrapy-spider

SangminKim

People also ask

1 Answers

alecxe

Recent Activity

Donate For Us