Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is Scrapy's Field a dict?

Tags:

python

scrapy

basically I have a really default setup, a spider subclassed from CrawlSpider and an item with three fields looking like this:

class AppdexItem(Item):
    name = Field()
    url = Field()
    desc = Field()

When my spider parses a response it populates an item like this:

i = AppdexItem()
name = hxs.select("//h1[@class='doc-banner-title']/text()")
i['name'] = name.extract()[0]

Now I got confused when I read what Field actually is. This is literally its implementation:

 class Field(dict):
     """Container of field metadata"""

It's a plain simple dict. I wondered why that is and stared at the implementation for a while. It still didn't make any sense. So I ran scrapy shell on a page which was supposed to be parsed into an item and this is what I got:

In [16]: item = spider.parse_app(response)

In [17]: item.fields
Out[17]: {'desc': {}, 'name': {}, 'url': {}}

In [18]: item['name']
Out[18]: u'Die Kleine Meerjungfrau'

What? Either I'm doing something completely wrong (I did everything like the official tutorials and examples told me) or Field being a dict is totally pointless.

Can someone explain that to me?

like image 918
dAnjou Avatar asked Dec 21 '22 09:12

dAnjou


1 Answers

Historical reasons. There used to be metadata attached to the fields which was stored in the dict. I assume a dict was used because it has a convenient (key=value) constructor. You can see that the last use of this was removed in this commit. At this point it makes very little difference and it could just be a plain object (although changing could be difficult if there's still code out there which assumes it's a dict for some reason).

like image 155
Rcxdude Avatar answered Jan 08 '23 20:01

Rcxdude