this simple code makes urlparse
get crazy and it does not get the hostname properly but sets it up to None
:
from urllib.parse import urlparse
parsed = urlparse("google.com/foo?bar=8")
print(parsed.hostname)
Am I missing something?
Method #1 : Using split() ' and return the first part of split for result.
URL Parsing. The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string.
According to https://www.rfc-editor.org/rfc/rfc1738#section-2.1:
Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
Using advice given in previous answers, I wrote this helper function which can be used in place of urllib.parse.urlparse()
:
#!/usr/bin/env python3
import re
import urllib.parse
def urlparse(address):
if not re.search(r'^[A-Za-z0-9+.\-]+://', address):
address = 'tcp://{0}'.format(address)
return urllib.parse.urlparse(address)
url = urlparse('localhost:1234')
print(url.hostname, url.port)
A previous version of this function called urllib.parse.urlparse(address)
, and then prepended the "tcp" scheme if one wasn't found; but this interprets the username as the scheme if you pass it something like "user:pass@localhost:1234".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With