this simple code makes <code>urlparse</code> get crazy and it does not get the hostname properly but sets it up to <code>None</code>: <pre class="prettyprint"><code>from urllib.parse import urlparse parsed = urlparse("google.com/foo?bar=8") print(parsed.hostname) </code></pre> Am I missing something?

According to https://www.rfc-editor.org/rfc/rfc1738#section-2.1: <blockquote> Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http"). </blockquote> Using advice given in previous answers, I wrote this helper function which can be used in place of <code>urllib.parse.urlparse()</code>: <pre class="prettyprint"><code>#!/usr/bin/env python3 import re import urllib.parse def urlparse(address): if not re.search(r'^[A-Za-z0-9+.\-]+://', address): address = 'tcp://{0}'.format(address) return urllib.parse.urlparse(address) url = urlparse('localhost:1234') print(url.hostname, url.port) </code></pre> A previous version of this function called <code>urllib.parse.urlparse(address)</code>, and then prepended the "tcp" scheme if one wasn't found; but this interprets the username as the scheme if you pass it something like "user:pass@localhost:1234".

urlparse fails with simple url

Tags:

python

python-3.x

urlparse

this simple code makes urlparse get crazy and it does not get the hostname properly but sets it up to None:

from urllib.parse import urlparse
parsed = urlparse("google.com/foo?bar=8")
print(parsed.hostname)

Am I missing something?

658

asked May 24 '18 00:05

user1618465

1 Answers

According to https://www.rfc-editor.org/rfc/rfc1738#section-2.1:

Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").

Using advice given in previous answers, I wrote this helper function which can be used in place of urllib.parse.urlparse():

#!/usr/bin/env python3
import re
import urllib.parse

def urlparse(address):
    if not re.search(r'^[A-Za-z0-9+.\-]+://', address):
        address = 'tcp://{0}'.format(address)
    return urllib.parse.urlparse(address)

url = urlparse('localhost:1234')
print(url.hostname, url.port)

A previous version of this function called urllib.parse.urlparse(address), and then prepended the "tcp" scheme if one wasn't found; but this interprets the username as the scheme if you pass it something like "user:pass@localhost:1234".

answered Oct 18 '22 15:10

Huw Walters

Related questions
                            
                                Match multiple keys values to database entry in TinyDB?
                            
                                pyspark: Could not find valid SPARK_HOME
                            
                                Call __exit__ on all members of a class
                            
                                How to get accumulative maximum indices with numpy in Python?
                            
                                Check if class property has a setter
                            
                                Groupwise sorting in pandas
                            
                                Plotly: Australia Choropleth map
                            
                                Bug writing audio using custom video writer library
                            
                                Is it necessary to close session after tensorflow InteractiveSession()
                            
                                how to run the code before the app.run() in flask?
                            
                                Pyspark CountVectorizer and Word Frequency in a corpus
                            
                                Setting a Plotly Dash dcc.dropdown value dynamically
                            
                                PyMySQL Access Denied "using password (no") but using password
                            
                                How to use two models in Tensorflow object Detection API
                            
                                Params for functions in jupyter lab w/ Python
                            
                                How to convert CIDR to IP ranges using python3?
                            
                                Tensorflow parsing and reshaping float list in Dataset.map()
                            
                                Reenable urllib3 warnings
                            
                                How to select top n row from each group after group by in pandas?
                            
                                Raise close spider from Scrapy pipeline

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With