Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urlparse fails with simple url

this simple code makes urlparse get crazy and it does not get the hostname properly but sets it up to None:

from urllib.parse import urlparse
parsed = urlparse("google.com/foo?bar=8")
print(parsed.hostname)

Am I missing something?

like image 658
user1618465 Avatar asked May 24 '18 00:05

user1618465


People also ask

How do you split a URL in Python?

Method #1 : Using split() ' and return the first part of split for result.

What is URL parse?

URL Parsing. The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string.


1 Answers

According to https://www.rfc-editor.org/rfc/rfc1738#section-2.1:

Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

Using advice given in previous answers, I wrote this helper function which can be used in place of urllib.parse.urlparse():

#!/usr/bin/env python3
import re
import urllib.parse

def urlparse(address):
    if not re.search(r'^[A-Za-z0-9+.\-]+://', address):
        address = 'tcp://{0}'.format(address)
    return urllib.parse.urlparse(address)

url = urlparse('localhost:1234')
print(url.hostname, url.port)

A previous version of this function called urllib.parse.urlparse(address), and then prepended the "tcp" scheme if one wasn't found; but this interprets the username as the scheme if you pass it something like "user:pass@localhost:1234".

like image 94
Huw Walters Avatar answered Oct 18 '22 15:10

Huw Walters