Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrapy with a nested array

I'm new to scrapy and would like to understand how to scrape on object for output into nested JSON. Right now, I'm producing JSON that looks like

[
{'a' : 1, 
'b' : '2',
'c' : 3},
]

And I'd like it more like this:

[
{ 'a' : '1',
'_junk' : [
     'b' : 2,
     'c' : 3]},
]

---where I put some stuff in _junk subfields to post-process later.

The current code under the parser definition file in my scrapername.py is...

item['a'] = x
item['b'] = y
item['c'] = z

And it seemed like

item['a'] = x
item['_junk']['b'] = y
item['_junk']['c'] = z

---might fix that, but I'm getting an error about the _junk key:

  File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 49, in __getitem__
    return self._values[key]
exceptions.KeyError: '_junk'

Does this mean I need to change my items.py somehow? Currently I have:

class Website(Item):
    a = Field()
    _junk = Field()
    b = Field()
    c = Field()
like image 423
Mittenchops Avatar asked Mar 19 '13 18:03

Mittenchops


1 Answers

You need to create the junk dictionary before storing items in it.

item['a'] = x
item['_junk'] = {}
item['_junk']['b'] = y
item['_junk']['c'] = z
like image 165
Mel Nicholson Avatar answered Sep 29 '22 02:09

Mel Nicholson