Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Get Class from lxml xpath

Using Twitter as simply for example only and ignoring the fact they have a perfectly usable API, the following script gets the current 5th tweet from the users page.

import urllib2
from lxml import etree

xpathselector = "/html/body/div/div[2]/div/div[5]/div[2]/div/ol/li[5]/div/div/p"
url =  "https://twitter.com/bmthofficial"
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpathselector)

print result[0].text

And at the time of this post it prints:

From 2.30pm, win tickets to Reading Festival, and introduce

Now, it prints the contents of < p > < /p >, how would I go about for example getting the class name of P? The HTML of it looks like this.

<p class="js-tweet-text tweet-text">From 2.30pm, win tickets to Reading Festival, and introduce <a dir="ltr" class="twitter-atreply pretty-link" href="/bmthofficial"><s>@</s><b>bmthofficial</b></a> onstage!</p>

Any help is appreciated! Thanks!

like image 426
user1130601 Avatar asked Jan 12 '23 14:01

user1130601


1 Answers

Use get method of Element:

print result[0].get('class')

prints

js-tweet-text tweet-text
like image 139
falsetru Avatar answered Jan 22 '23 04:01

falsetru