Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib not raising invalid URL

I'm getting some curious behaviour parsing when parsing a URL. I was expecting to receive an invalid URL exception, but instead, the hostname of the following URL returns the URL in '[]' brackets:

from urllib.parse import urlparse
print(urlparse('http://myurl.com[notmyurl.com]').hostname)

Output:

>>> notmyurl.com

Is this expected behaviour?

like image 432
Nick Martin Avatar asked Nov 19 '25 11:11

Nick Martin


1 Answers

This is expected behavior running your code through a debugger and stepping through the steps in the parse.py of urllib we see the following

@property
def _hostinfo(self):
    netloc = self.netloc
    _, _, hostinfo = netloc.rpartition('@')
    _, have_open_br, bracketed = hostinfo.partition('[')
    if have_open_br:
        hostname, _, port = bracketed.partition(']')
        _, _, port = port.partition(':')
    else:
        hostname, _, port = hostinfo.partition(':')
    if not port:
        port = None
    return hostname, port

So you can see the _hostinfo method call will check for brackets in the url in return you the value from inside the brackets. Below is a screen shot of running your code through the pycharm debugger as you see in the code window it tells you the value set for each parameter and where is starts striping out the not url to return.

enter image description here

like image 139
Chris Doyle Avatar answered Nov 22 '25 02:11

Chris Doyle