Scrapy: skip item and continue with exectuion

Question

I'm doing a RSS spider. I want to continue with the execution of the spider ignoring the current node if there isn't a match in the current item... So far I've got this:

        if info.startswith('Foo'):
            item['foo'] = info.split(':')[1]
        else:
            return None

(info is a string that's sanitized from a xpath before...)

But I'm getting this exception:

    exceptions.TypeError: You cannot return an "NoneType" object from a

spider

So how can I ignore this node and continue with the execution?

seriyPS · Accepted Answer

parse(response):
    #make some manipulations
    if info.startswith('Foo'):
            item['foo'] = info.split(':')[1]
            return [item]
        else:
            return []

But better is not use return, use yield or do nothing

parse(response):
    #make some manipulations
    if info.startswith('Foo'):
            item['foo'] = info.split(':')[1]
            yield item
        else:
            return

Nour Wolf · Answer

There is an undocumented method I figured out when I had to skip the item during the parsing but while outside the callback function.

Simply raise StopIteration anywhere during the parsing.

class MySpider(Spider):
    def parse(self, response):
        value1 = parse_something1()
        value2 = parse_something1()
        yield Item(value1, value2)

    def parse_something1(self):
        try:
            return get_some_value()
        except Exception:
            self.skip_item()

    def parse_something2(self):
        if something_wrong:
            self.skip_item()

    def skip_item(self):
        raise StopIteration

Scrapy: skip item and continue with exectuion

Tags:

python

scrapy

web-crawler

anders

2 Answers

seriyPS

Nour Wolf

Recent Activity

Donate For Us

Scrapy: skip item and continue with exectuion

Tags:

python

scrapy

web-crawler

anders

2 Answers

seriyPS

Nour Wolf

Related questions

Recent Activity

Donate For Us