Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy: If key exists, why do I get a KeyError?

With items.py defined:

import scrapy 

class CraigslistSampleItem(scrapy.Item):
    title = scrapy.Field()
    link = scrapy.Field()

and populating each item via the spider thus:

item = CraigslistSampleItem()
item["title"] = $someXpath.extract() 
item["link"] = $someOtherXpath.extract()

When I append these to a list (returned by parse()) and store this as e.g. a csv, I get two columns of data, title and link, as expected. If I comment out the XPath for link and store as a csv, I still get two columns of data, with the values in the link column being empty strings. This seems reasonable, as both title and link are attributes of every CraigslistSampleItem class. I would think, then, that I could do something like this (with the XPath for link still commented out):

  if item["link"] == '':
      print "link has not been given a value"

Yet the attempt to get the link attribute for each item fails thus:

File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scrapy/item.py", line 50, in __getitem__
    return self._values[key]
exceptions.KeyError: 'link'

If each item instance does indeed have a value for link (albeit an empty string), why can't I access this key?

like image 964
Pyderman Avatar asked Mar 15 '23 21:03

Pyderman


1 Answers

Scrapy Item class provides a dictionary-like interface for storing the extracted data. There are no default values set for item fields.

To check whether the field was set or not, simply check for the field key in the item instance:

if 'link' not in item:
    print "link has not been given a value"

Demo:

In [1]: import scrapy

In [2]: class CraigslistSampleItem(scrapy.Item):
   ...:         title = scrapy.Field()
   ...:         link = scrapy.Field()
   ...:     

In [3]: item = CraigslistSampleItem()

In [4]: item["title"] = "test"

In [5]: item
Out[5]: {'title': 'test'}

In [6]: "link" in item
Out[6]: False

In [7]: item["link"] = "test link"

In [8]: "link" in item
Out[8]: True
like image 155
alecxe Avatar answered Mar 25 '23 00:03

alecxe