Using Twitter as simply for example only and ignoring the fact they have a perfectly usable API, the following script gets the current 5th tweet from the users page.
import urllib2
from lxml import etree
xpathselector = "/html/body/div/div[2]/div/div[5]/div[2]/div/ol/li[5]/div/div/p"
url = "https://twitter.com/bmthofficial"
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpathselector)
print result[0].text
And at the time of this post it prints:
From 2.30pm, win tickets to Reading Festival, and introduce
Now, it prints the contents of < p > < /p >, how would I go about for example getting the class name of P? The HTML of it looks like this.
<p class="js-tweet-text tweet-text">From 2.30pm, win tickets to Reading Festival, and introduce <a dir="ltr" class="twitter-atreply pretty-link" href="/bmthofficial"><s>@</s><b>bmthofficial</b></a> onstage!</p>
Any help is appreciated! Thanks!
Use get
method of Element:
print result[0].get('class')
prints
js-tweet-text tweet-text
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With