I'm doing a RSS spider. I want to continue with the execution of the spider ignoring the current node if there isn't a match in the current item... So far I've got this:
if info.startswith('Foo'):
item['foo'] = info.split(':')[1]
else:
return None
(info is a string that's sanitized from a xpath before...)
But I'm getting this exception:
exceptions.TypeError: You cannot return an "NoneType" object from a
spider
So how can I ignore this node and continue with the execution?
parse(response):
#make some manipulations
if info.startswith('Foo'):
item['foo'] = info.split(':')[1]
return [item]
else:
return []
But better is not use return, use yield
or do nothing
parse(response):
#make some manipulations
if info.startswith('Foo'):
item['foo'] = info.split(':')[1]
yield item
else:
return
There is an undocumented method I figured out when I had to skip the item during the parsing but while outside the callback function.
Simply raise StopIteration
anywhere during the parsing.
class MySpider(Spider):
def parse(self, response):
value1 = parse_something1()
value2 = parse_something1()
yield Item(value1, value2)
def parse_something1(self):
try:
return get_some_value()
except Exception:
self.skip_item()
def parse_something2(self):
if something_wrong:
self.skip_item()
def skip_item(self):
raise StopIteration
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With