My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http. If I do not adjust urlparse's uses_* lists I get this: <pre class="prettyprint"><code>>>> urlparse.urlparse("qqqq://base/id#hint") ('qqqq', '', '//base/id#hint', '', '', '') >>> urlparse.urlparse("http://base/id#hint") ('http', 'base', '/id', '', '', 'hint') </code></pre> Here is what I do, and I wonder if there is a better way to do it: <pre class="prettyprint"><code>import urlparse SCHEME = "qqqq" # One would hope that there was a better way to do this urlparse.uses_netloc.append(SCHEME) urlparse.uses_fragment.append(SCHEME) </code></pre> Why is there no better way to do this?

You can also register a custom handler with urlparse: <pre class="prettyprint"><code>import urlparse def register_scheme(scheme): for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)): getattr(urlparse, method).append(scheme) register_scheme('moose') </code></pre> This will append your url scheme to the lists: <pre class="prettyprint"><code>uses_fragment uses_netloc uses_params uses_query uses_relative </code></pre> The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc. <pre class="prettyprint"><code>urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict() => {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'} </code></pre>

Parse custom URIs with urlparse (Python)

Tags:

python

url

python-2.6

urlparse

My application creates custom URIs (or URLs?) to identify objects and resolve them. The problem is that Python's urlparse module refuses to parse unknown URL schemes like it parses http.

If I do not adjust urlparse's uses_* lists I get this:

>>> urlparse.urlparse("qqqq://base/id#hint")
('qqqq', '', '//base/id#hint', '', '', '')
>>> urlparse.urlparse("http://base/id#hint")
('http', 'base', '/id', '', '', 'hint')

Here is what I do, and I wonder if there is a better way to do it:

import urlparse

SCHEME = "qqqq"

# One would hope that there was a better way to do this
urlparse.uses_netloc.append(SCHEME)
urlparse.uses_fragment.append(SCHEME)

Why is there no better way to do this?

377

asked Sep 13 '09 15:09

u0b34a0f6ae

2 Answers

You can also register a custom handler with urlparse:

import urlparse

def register_scheme(scheme):
    for method in filter(lambda s: s.startswith('uses_'), dir(urlparse)):
        getattr(urlparse, method).append(scheme)

register_scheme('moose')

This will append your url scheme to the lists:

uses_fragment
uses_netloc
uses_params
uses_query
uses_relative

The uri will then be treated as http-like and will correctly return the path, fragment, username/password etc.

urlparse.urlparse('moose://username:password@hostname:port/path?query=value#fragment')._asdict()
=> {'fragment': 'fragment', 'netloc': 'username:password@hostname:port', 'params': '', 'query': 'query=value', 'path': '/path', 'scheme': 'moose'}

174

answered Sep 24 '22 01:09

toothygoose

I think the problem is that URI's don't all have a common format after the scheme. For example, mailto: urls aren't structured the same as http: urls.

I would use the results of the first parse, then synthesize an http url and parse it again:

parts = urlparse.urlparse("qqqq://base/id#hint")
fake_url = "http:" + parts[2]
parts2 = urlparse.urlparse(fake_url)

answered Sep 22 '22 01:09

Ned Batchelder

Related questions
                            
                                Matplotlib 3D scatter animations
                            
                                "DataFrame" object has no attribute 'reshape'
                            
                                End loop with counter and condition
                            
                                How to create a new log file every time the application runs?
                            
                                Importing JSON into Pandas
                            
                                Pandas dataframe conditional mean based on column names
                            
                                Tokenizing using Pandas and spaCy
                            
                                Count non-null values in each row with pandas
                            
                                Equivalent of "table" of R in python
                            
                                How to find alternating repetitive digit pair?
                            
                                Elegant alternative to long exception chains? [duplicate]
                            
                                changing global variables within a function in python
                            
                                Python unittest does not run tests
                            
                                Multivariate input LSTM in pytorch
                            
                                How to convert torch tensor to pandas dataframe?
                            
                                How to plot multiple lines on the same y-axis using Plotly Express in Python
                            
                                Apply function to pandas row-row cross product
                            
                                After writing to a file, why does os.path.getsize still return the previous size?
                            
                                SQLAlchemy: Operating on results
                            
                                Importing files in Python from __init__.py

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With