Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why lxml isn't finding xpath given by Chrome inspector?

Here is my code:

from lxml import html
import requests

page = requests.get('https://en.wikipedia.org/wiki/Nabucco')
tree = html.fromstring(page.content)
title = tree.xpath('//*[@id="mw-content-text"]/table[1]/tbody/tr[1]/th/i')
print(title)

Problem: print(title) prints "[]", empty list. I expect this to print "Nabucco". The XPath expression is from Chrome inspector "Copy XPath" function.

Why isn't this working? Is there a disagreement between lxml and Chrome's xpath engine? Or am I missing something? I am somewhat new to python, lxml and xpath.

like image 877
noctonura Avatar asked Nov 14 '15 17:11

noctonura


1 Answers

That's because of the tbody tag. You see it in the browser since the tag was inserted by the browser. requests is not a browser and just downloads the page source as is:

Replace:

//*[@id="mw-content-text"]/table[1]/tbody/tr[1]/th/i

with:

//*[@id="mw-content-text"]/table[1]/tr[1]/th/i
like image 60
alecxe Avatar answered Oct 10 '22 19:10

alecxe