Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to implement nested item in scrapy?

I am scraping some data with complex hierarchical info and need to export the result to json.

I defined the items as

class FamilyItem():     name = Field()     sons = Field()  class SonsItem():     name = Field()     grandsons = Field()  class GrandsonsItem():     name = Field()     age = Field()     weight = Field()     sex = Field() 

and when the spider runs complete, I will get a printed item output like

{'name': 'Jenny',    'sons': [             {'name': u'S1',              'grandsons': [                    {'name': u'GS1',                     'age': 18,                     'weight': 50                    },                    {                     'name':u'GS2',                     'age': 19,                     'weight':51}]                    }] } 

but when I run scrapy crawl myscaper -o a.json, it always says the result "is not JSON serializable". Then I copy and paste the item output into ipython console and use json.dumps(), it works fine.So where is the problem? this is driving my nuts...

like image 943
Shadow Lau Avatar asked Jun 25 '12 06:06

Shadow Lau


2 Answers

When saving the nested items, make sure to wrap them in a call to dict(), e.g.:

gs1 = GrandsonsItem() gs1['name'] = 'GS1' gs1['age'] = 18 gs1['weight'] = 50  gs2 = GrandsonsItem() gs2['name'] = 'GS2' gs2['age'] = 19 gs2['weight'] = 51  s1 = SonsItem() s1['name'] = 'S1' s1['grandsons'] = [dict(gs1), dict(gs2)]  jenny = FamilyItem() jenny['name'] = 'Jenny' jenny['sons'] = [dict(s1)] 
like image 157
Myle Ott Avatar answered Sep 28 '22 06:09

Myle Ott


Not sure if there's a way to do nested items in scrapy with classes but arrays work fine. You could do something like this:

grandson = Grandson(name = 'Grandson', age = 2)  son = Son(name = 'Son', grandsons = [grandson])  item = Item(name = 'Name', son = [son]) 
like image 37
Leo Avatar answered Sep 28 '22 06:09

Leo