I am using the following code to get the url of an item:
node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
It gives me something like:
itunes20170107.tbz
However, I'm looking to get the full url, which is:
https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current/itunes20170109.tbz
Is there an easy way to get the full url from lxml, without building it myself?
lxml.html
would simply parse the href
as it is inside the HTML. If you want to make links absolute and not relative, you should use urljoin()
:
from urllib.parse import urljoin # Python3
# from urlparse import urljoin # Python2
url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
relative_url = node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
absolute_url = urljoin(url, relative_url)
Demo:
>>> from urllib.parse import urljoin # Python3
>>>
>>> url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
>>>
>>> relative_url = "itunes20170107.tbz"
>>> absolute_url = urljoin(url, relative_url)
>>> absolute_url
'https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/itunes20170107.tbz'
Another way to do it:
import requests
from lxml import fromstring
url = 'http://server.com'
response = reqests.get(url)
etree = fromstring(response.text)
etree.make_links_absolute(url)`
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With