How to get absolute url from xpath?

Question

I am using the following code to get the url of an item:

node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']

It gives me something like:

itunes20170107.tbz

However, I'm looking to get the full url, which is:

https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current/itunes20170109.tbz

Is there an easy way to get the full url from lxml, without building it myself?

alecxe · Accepted Answer

lxml.html would simply parse the href as it is inside the HTML. If you want to make links absolute and not relative, you should use urljoin():

from urllib.parse import urljoin  # Python3
# from urlparse import urljoin  # Python2 

url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"

relative_url = node.xpath('//td/a[starts-with(text(),"itunes")]')[0].attrib['href']
absolute_url = urljoin(url, relative_url)

Demo:

>>> from urllib.parse import urljoin  # Python3
>>> 
>>> url = "https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/current"
>>> 
>>> relative_url = "itunes20170107.tbz"
>>> absolute_url = urljoin(url, relative_url)
>>> absolute_url
'https://feeds.itunes.apple.com/feeds/epf/v3/full/20170105/incremental/itunes20170107.tbz'

El Ruso · Answer

Another way to do it:

import requests
from lxml import fromstring

url = 'http://server.com'
response = reqests.get(url)
etree = fromstring(response.text)
etree.make_links_absolute(url)`

How to get absolute url from xpath?

Tags:

python

xpath

lxml

David542

2 Answers

alecxe

El Ruso

Recent Activity

Donate For Us

How to get absolute url from xpath?

Tags:

python

xpath

lxml

David542

2 Answers

alecxe

El Ruso

Related questions

Recent Activity

Donate For Us