basically I have a really default setup, a spider subclassed from CrawlSpider
and an item with three fields looking like this:
class AppdexItem(Item):
name = Field()
url = Field()
desc = Field()
When my spider parses a response it populates an item like this:
i = AppdexItem()
name = hxs.select("//h1[@class='doc-banner-title']/text()")
i['name'] = name.extract()[0]
Now I got confused when I read what Field actually is. This is literally its implementation:
class Field(dict):
"""Container of field metadata"""
It's a plain simple dict
. I wondered why that is and stared at the implementation for a while. It still didn't make any sense. So I ran scrapy shell
on a page which was supposed to be parsed into an item and this is what I got:
In [16]: item = spider.parse_app(response)
In [17]: item.fields
Out[17]: {'desc': {}, 'name': {}, 'url': {}}
In [18]: item['name']
Out[18]: u'Die Kleine Meerjungfrau'
What? Either I'm doing something completely wrong (I did everything like the official tutorials and examples told me) or Field
being a dict
is totally pointless.
Can someone explain that to me?
Historical reasons. There used to be metadata attached to the fields which was stored in the dict. I assume a dict was used because it has a convenient (key=value) constructor. You can see that the last use of this was removed in this commit. At this point it makes very little difference and it could just be a plain object (although changing could be difficult if there's still code out there which assumes it's a dict for some reason).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With