get list items inside div tag using xpath

Question

I have a html like this

<div id="all-stories" class="book"> 
<ul>

<li title="Book1"  ><a href="book1_url">Book1</a></li>

<li title="Book2"  ><a href="book2_url">Book2</a></li>
</ul>

</div>

I want to get the books and their respective url using xpath, but it seems my approach is not working. for simplicity i tried to extract all the elements under "li " tags as follows

lis = tree.xpath('//div[@id="all-stories"]/div/text()')

unutbu · Accepted Answer

import lxml.html as LH

content = '''\
<div id="all-stories" class="book"> 
<ul>

<li title="Book1"  ><a href="book1_url">Book1</a></li>

<li title="Book2"  ><a href="book2_url">Book2</a></li>
</ul>

</div>
'''
root = LH.fromstring(content)
for atag in root.xpath('//div[@id="all-stories"]//li/a'):
    print(atag.attrib['href'], atag.text_content())

yields

('book1_url', 'Book1')
('book2_url', 'Book2')

The XPath //div[@id="all-stories"]/div does not match anything because there is no child div inside the outer div tag.

The XPath //div[@id="all-stories"]/li also would not match because the there is no direct child li tage inside the div tag. However, //div[@id="all-stories"]//li does match li tags because // tells XPath to recursively search as deeply as necessary to find the li tags.

Now, the content you are looking for is not in the li tag. It is inside the a tag. So instead use the XPath '//div[@id="all-stories"]//li/a' to reach the a tags. The value of the href attribute can be accessed with atag.attrib['href'], and the text with atag.text_content().

get list items inside div tag using xpath

Tags:

python

xpath

lxml

Anurag Sharma

1 Answers

unutbu

Recent Activity

Donate For Us

get list items inside div tag using xpath

Tags:

python

xpath

lxml

Anurag Sharma

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us