I am scraping some data with complex hierarchical info and need to export the result to json.
I defined the items as
class FamilyItem(): name = Field() sons = Field() class SonsItem(): name = Field() grandsons = Field() class GrandsonsItem(): name = Field() age = Field() weight = Field() sex = Field()
and when the spider runs complete, I will get a printed item output like
{'name': 'Jenny', 'sons': [ {'name': u'S1', 'grandsons': [ {'name': u'GS1', 'age': 18, 'weight': 50 }, { 'name':u'GS2', 'age': 19, 'weight':51}] }] }
but when I run scrapy crawl myscaper -o a.json
, it always says the result "is not JSON serializable". Then I copy and paste the item output into ipython console and use json.dumps(), it works fine.So where is the problem? this is driving my nuts...
When saving the nested items, make sure to wrap them in a call to dict(), e.g.:
gs1 = GrandsonsItem() gs1['name'] = 'GS1' gs1['age'] = 18 gs1['weight'] = 50 gs2 = GrandsonsItem() gs2['name'] = 'GS2' gs2['age'] = 19 gs2['weight'] = 51 s1 = SonsItem() s1['name'] = 'S1' s1['grandsons'] = [dict(gs1), dict(gs2)] jenny = FamilyItem() jenny['name'] = 'Jenny' jenny['sons'] = [dict(s1)]
Not sure if there's a way to do nested items in scrapy with classes but arrays work fine. You could do something like this:
grandson = Grandson(name = 'Grandson', age = 2) son = Son(name = 'Son', grandsons = [grandson]) item = Item(name = 'Name', son = [son])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With