i am new in python & scrapy. i tried to run existing code, but i got this error on every address:
> 2015-07-02 01:52:19 [scrapy] DEBUG: Crawled (200) <GET http://www.tripadvisor.com/ShowUserReviews-g187147-d197524-r281927613-Hotel_Mirific_Opera-Paris_Ile_de_France.html>
> (referer:
> http://www.tripadvisor.com/Hotel_Review-g187147-d197524-Reviews-Hotel_Mirific_Opera-Paris_Ile_de_France.html)2015-07-02
> 01:52:19
> [scrapy] ERROR: Spider error processing <GET http://www.tripadvisor.com/ShowUserReviews-g187147-d197524-r281927613-Hotel_Mirific_Opera-Paris_Ile_de_France.html>
> (referer:
> http://www.tripadvisor.com/Hotel_Review-g187147-d197524-Reviews-Hotel_Mirific_Opera-Paris_Ile_de_France.html)
>
> Traceback (most recent call last): File
> "/usr/local/lib/python2.7/dist-packages/scrapy/utils/defer.py", line
> 102, in iter_errback
> yield next(it) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/offsite.py",
> line 28, in process_spider_output
> for x in result: File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/referer.py",
> line 22, in <genexpr>
> return (_set_referer(r) for r in result or ()) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/urllength.py",
> line 37, in <genexpr>
> return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/dist-packages/scrapy/spidermiddlewares/depth.py",
> line 54, in <genexpr>
> return (r for r in result or () if _filter(r)) File "/usr/local/lib/python2.7/dist-packages/scrapy/spiders/crawl.py", line
> 67, in _parse_response
> cb_res = callback(response, **cb_kwargs) or () File "/home/talmosko/Documents/scrapy/tripAdvisor/spiders/tripAdvisor.py",
> line 30, in parse_item
> item['state'] = hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii',
> errors='ignore')
>
> IndexError: list index out of range
this is my code: http://pastebin.com/XzM5DrDD
What is the problem? it seems like the spide didnt get an answer..
Thanks!
You are trying to access an element that doesn't exist, the error is in this line
item['state'] = hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()[0].encode('ascii', errors='ignore')
Problably
item['state'] = hxs.xpath('//*[@id="PAGE"]/div[2]/div[1]/ul/li[2]/a/span/text()').extract()
is empty and you are trying to access the first element. You have two options:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With