Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse custom URIs with urlparse (Python)

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

If I do not adjust urlparse's uses_* lists I get this:

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?

like image 377
u0b34a0f6ae Avatar asked Sep 13 '09 15:09

u0b34a0f6ae


People also ask

How does Urlparse work in Python?

The urlparse module contains functions to process URLs, and to convert between URLs and platform-specific filenames. Example 7-16 demonstrates. A common use is to split an HTTP URL into host and path components (an HTTP request involves asking the host to return data identified by the path), as shown in Example 7-17.

What does Urllib parse Urlencode do?

parse. urlencode() method can be used for generating the query string of a URL or data for a POST request.

What is Netloc Urlparse?

urlparse. netloc is the name of the server (ip address or host name). – Paul Rooney. Jan 1, 2019 at 2:59.


2 Answers

You can also register a custom handler with urlparse:

import urlparse

def register_scheme(scheme):
    for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)):
        getattr(urlparse, method).append(scheme)

register_scheme('moose')

This will append your url scheme to the lists:

uses_fragment
uses_netloc
uses_params
uses_query
uses_relative

The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.

urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}
like image 174
toothygoose Avatar answered Sep 24 '22 01:09

toothygoose


I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

I would use the results of the first parse, then synthesize an http url and parse it again:

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)
like image 20
Ned Batchelder Avatar answered Sep 22 '22 01:09

Ned Batchelder