Can scrapy yield both request and items?

Tags:

scrapy

When I write parse() function, can I yield both a request and items for one single page?

I want to extract some data in page A and then store the data in database, and extract links to be followed (this can be done by rule in CrawlSpider).

I call the links pages of A pages is B pages, so I can write another parse_item() to extract data from B pages, but I want to extract some links in B pages, so I can only use rule to extract links? how to tackle with the duplicate URLs in Scrapy?

482

asked Dec 30 '12 18:12

kuafu

2 Answers

Yes, you can yield both requests and items. From what I've seen:

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    base_url = response.url
    links = hxs.select(self.toc_xpath)

    for index, link in enumerate(links):
        href, text = link.select('@href').extract(), link.select('text()').extract()
        yield Request(urljoin(base_url, href[0]), callback=self.parse2)

    for item in self.parse2(response):
        yield item

answered Nov 15 '22 16:11

Cacovsky

I'm not 100% I understand your question but the code below request sites from a starting url using the basespider, then scans the starting url for href's then loops each link calling parse_url. everything matched in parse_url is sent to your item pipeline.

def parse(self, response):
       hxs = HtmlXPathSelector(response)
       urls = hxs.select('//a[contains(@href, "content")]/@href').extract()  ## only grab url with content in url name
       for i in urls:
           yield Request(urlparse.urljoin(response.url, i[1:]),callback=self.parse_url)


def parse_url(self, response):
   hxs = HtmlXPathSelector(response)
   item = ZipgrabberItem()
   item['zip'] = hxs.select("//div[contains(@class,'odd')]/text()").extract() ## this bitch grabs it
   return item

answered Nov 15 '22 14:11

Chris Hawkes

Related questions
                            
                                tuples as function arguments
                            
                                Flask-Script: from flask._compat import text_type ModuleNotFoundError: No module named 'flask._compat'
                            
                                Passing on named variable arguments in python
                            
                                In Python, how can you easily retrieve sorted items from a dictionary?
                            
                                What's the most efficient way to insert thousands of records into a table (MySQL, Python, Django)
                            
                                Run a function every X minutes - Python
                            
                                Get first non-empty string from a list in python
                            
                                How do I convert unicode characters to floats in Python?
                            
                                What exception to raise if wrong number of arguments passed in to **kwargs?
                            
                                Does Python 3 have LDAP module?
                            
                                PEP8 - 80 Characters - Big Integers
                            
                                Remove certain keys from a dictionary in python
                            
                                How to upload binary file with ftplib in Python?
                            
                                Should I use numpy (or pylab) as a python environment by using `from numpy import *`?
                            
                                Git commit from python
                            
                                Using a context manager with Python assertRaises
                            
                                Is python uuid1 sequential as timestamps?
                            
                                Is there a better way to mask a Credit Card number in python?
                            
                                Paste Multi-line Snippets into IPython
                            
                                get object from redis without eval?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With