Python: How to check if a string is a valid IRI?

Question

Is there a standard function to check an IRI, to check an URL apparently I can use:

parts = urlparse.urlsplit(url)  
    if not parts.scheme or not parts.netloc:  
        '''apparently not an url'''

I tried the above with an URL containing Unicode characters:

import urlparse
url = "http://fdasdf.fdsfîășîs.fss/ăîăî"
parts = urlparse.urlsplit(url)
if not parts.scheme or not parts.netloc:  
    print "not an url"
else:
    print "yes an url"

and what I get is yes an url. Does this means I'm good an this tests for valid IRI? Is there another way ?

Martijn Pieters · Accepted Answer

Using urlparse is not sufficient to test for a valid IRI.

Use the rfc3987 package instead:

from rfc3987 import parse

parse('http://fdasdf.fdsfîășîs.fss/ăîăî', rule='IRI')

ecatmur · Answer

The only character-set-sensitive code in the implementation of urlparse is requiring that the scheme should contain only ASCII letters, digits and [+-.] characters; otherwise it's completely agnostic so will work fine with non-ASCII characters.

As this is non-documented behaviour, it's your responsibility to check that it continues to be the case (with tests in your project), but I don't imagine it would be changed to break IRIs.

urllib provides quoting functions to convert IRIs to/from ASCII URIs, although they still don't mention IRIs explicitly in the documentation, and they are broken in some cases: Is there a unicode-ready substitute I can use for urllib.quote and urllib.unquote in Python 2.6.5?

Python: How to check if a string is a valid IRI?

Tags:

python

url

python-2.7

urlparse

Eduard Florinescu

2 Answers

Martijn Pieters

ecatmur

Recent Activity

Donate For Us

Python: How to check if a string is a valid IRI?

Tags:

python

url

python-2.7

urlparse

Eduard Florinescu

2 Answers

Martijn Pieters

ecatmur

Related questions

Recent Activity

Donate For Us