Not able to parse html using lxml Xpath parser

Tags:

I am trying to parse review from this page: http://www.amazon.co.uk/product-reviews/B00143ZBHY

Using following approach:

Code

html # a variable which contains exact html as given at the above page.
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]")
print len(r)
print r[0].tag

Output

0
Traceback (most recent call last):
  File "c.py", line 37, in <module>
    print r[0].tag
IndexError: list index out of range

p,s,: While using the same xpath on xpath checker addon of firefox I am able todo it easily. But no result here, please help!

268

asked Jul 12 '12 19:07

codersofthedark

1 Answers

Try to remove /tbody form XPath — there is no <tbody> in #productReviews.

import urllib2
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read()
from lxml import etree
tree = etree.HTML(html)
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]")
print r[0]

Output:

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind.  so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time.  seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!

answered Oct 08 '22 04:10

fedosov

Related questions
                            
                                matplotlibs ginput() with undefined number of points
                            
                                pyenchant can't find dictionary file on Mac OS X
                            
                                Pairwise Testing Combination generator in Python
                            
                                Django Middleware - How to edit the HTML of a Django Response object?
                            
                                no access to GetCaptureProperty or any similar function in python opencv
                            
                                Gevent monkey unpatch
                            
                                How to calculate a short fixed length obfuscated ID similar to YouTube (e.g. 2WNrx2jq184)
                            
                                Rotate small portion of an array by 90 degrees
                            
                                How do I add optional positional arguments with subparsers in argparse?
                            
                                Why is my python function not defined, when it exists in the same file?
                            
                                Convert Unicode to double ASCII letters in Python (ß -> ss)
                            
                                ADF test in statsmodels in Python
                            
                                jinja2 + reStructured Markup
                            
                                Difference between select_related() and select_related('columnname') in django
                            
                                Python Tkinter: Delete the Last Character of a String
                            
                                How do I listen to multiple udp ports using twisted?
                            
                                Can Regex groups and * wildcards work together?
                            
                                Python prints result as '7\xe6\x9c\x8810\xe6\x97\xa5', but I want '7月10日'
                            
                                How to Write python code in a wordpress blog? [closed]
                            
                                What's the difference between Blowfish and Blowfish-compat?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Not able to parse html using lxml Xpath parser

Tags:

python

xpath

lxml

codersofthedark

People also ask

1 Answers

fedosov

Recent Activity

Donate For Us