Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to construct relative url, given two absolute urls in Python

Is there a builtin function to get url like this: ../images.html given a base url like this: http://www.example.com/faq/index.html and a target url such as http://www.example.com/images.html

I checked urlparse module. What I want is counterpart of the urljoin() function.

like image 920
yasar Avatar asked Sep 19 '11 10:09

yasar


People also ask

How do I combine two URLs in python?

Use the urljoin method from the urllib. parse module to join a base URL with another URLs, e.g. result = urljoin(base_url, path) . The urljoin method constructs a full (absolute) URL by combining a base URL with another URL.

How do you create an absolute URL in Python?

Use abspath() to Get the Absolute Path in Python To get the absolute path using this module, call path. abspath() with the given path to get the absolute path. The output of the abspath() function will return a string value of the absolute path relative to the current working directory.

How do you know if a URL is absolute or relative in Python?

Python 2. You can use the urlparse module to parse an URL and then you can check if it's relative or absolute by checking whether it has the host name set.

How do I create a relative URL?

To link pages using relative URL in HTML, use the <a> tag with href attribute. Relative URL is used to add a link to a page on the website. For example, /contact, /about_team, etc.


1 Answers

You could use urlparse.urlparse to find the paths, and the posixpath version of os.path.relname to find the relative path.

(Warning: This works for Linux, but may not for Windows):

import urlparse
import sys
import posixpath

def relurl(target,base):
    base=urlparse.urlparse(base)
    target=urlparse.urlparse(target)
    if base.netloc != target.netloc:
        raise ValueError('target and base netlocs do not match')
    base_dir='.'+posixpath.dirname(base.path)
    target='.'+target.path
    return posixpath.relpath(target,start=base_dir)

tests=[
    ('http://www.example.com/images.html','http://www.example.com/faq/index.html','../images.html'),
    ('http://google.com','http://google.com','.'),
    ('http://google.com','http://google.com/','.'),
    ('http://google.com/','http://google.com','.'),
    ('http://google.com/','http://google.com/','.'), 
    ('http://google.com/index.html','http://google.com/','index.html'),
    ('http://google.com/index.html','http://google.com/index.html','index.html'), 
    ]

for target,base,answer in tests:
    try:
        result=relurl(target,base)
    except ValueError as err:
        print('{t!r},{b!r} --> {e}'.format(t=target,b=base,e=err))
    else:
        if result==answer:
            print('{t!r},{b!r} --> PASS'.format(t=target,b=base))
        else:
            print('{t!r},{b!r} --> {r!r} != {a!r}'.format(
                t=target,b=base,r=result,a=answer))
like image 80
unutbu Avatar answered Oct 12 '22 23:10

unutbu