Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two urls in Python

Tags:

python

url

Is there a standard way to compare two urls in Python - that implements are_url_the_same in this example:

url_1 = 'http://www.foo.com/bar?a=b&c=d'
url_2 = 'http://www.foo.com:80/bar?c=d;a=b'

if are_urls_the_same(url_1, url2):
    print "URLs are the same"

By the same I mean that they access the same resource - so the two urls in the example are the same.

like image 349
EvdB Avatar asked Mar 20 '11 22:03

EvdB


3 Answers

Here is a simple class that enables you to do this:

if Url(url1) == Url(url2):
    pass

It could easily be revamped as a function, though these objects are hashable, and therefore enable you to add them into a cache using a set or dictionary:

# Python 2
# from urlparse import urlparse, parse_qsl
# from urllib import unquote_plus
# Python 3
from urllib.parse import urlparse, parse_qsl, unquote_plus
    
class Url(object):
    '''A url object that can be compared with other url orbjects
    without regard to the vagaries of encoding, escaping, and ordering
    of parameters in query strings.'''

    def __init__(self, url):
        parts = urlparse(url)
        _query = frozenset(parse_qsl(parts.query))
        _path = unquote_plus(parts.path)
        parts = parts._replace(query=_query, path=_path)
        self.parts = parts

    def __eq__(self, other):
        return self.parts == other.parts

    def __hash__(self):
        return hash(self.parts)
like image 132
twneale Avatar answered Oct 18 '22 22:10

twneale


Use urlparse and write a comparison function with the fields that you need

>>> from urllib.parse import urlparse
>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')

And you can compare on any of the following:

  1. scheme 0 URL scheme specifier
  2. netloc 1 Network location part
  3. path 2 Hierarchical path
  4. params 3 Parameters for last path element
  5. query 4 Query component
  6. fragment 5 Fragment identifier
  7. username User name
  8. password Password
  9. hostname Host name (lower case)
  10. port Port number as integer, if present
like image 40
fabrizioM Avatar answered Oct 18 '22 22:10

fabrizioM


Lib https://github.com/rbaier/urltools

Have a look at my project i am doing the same thing

https://github.com/tg123/tao.bb/blob/master/url_normalize.py

like image 4
farmer1992 Avatar answered Oct 18 '22 21:10

farmer1992