Suppose I have a string of the of the format host:port
, where :port
is optional. How can I reliably extract the two components?
The host can be any of:
localhost
, www.google.com
)1.2.3.4
)[aaaa:bbbb::cccc]
).In other words, this is the standard format used across the internet (such as in URIs: complete grammar at https://www.rfc-editor.org/rfc/rfc3986#section-3.2, excluding the "User Information" component).
So, some possible inputs, and desired outputs:
'localhost' -> ('localhost', None)
'my-example.com:1234' -> ('my-example.com', 1234)
'1.2.3.4' -> ('1.2.3.4', None)
'[0abc:1def::1234]' -> ('[0abc:1def::1234]', None)
The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string. Parse a URL into six components, returning a 6-item named tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment .
This small port scanner program will try to connect on every port you define for a particular host. The first thing we must do is import the socket library and other libraries that we need. Open up an text editor, copy & paste the code below. Save the file as: “portscanner.py” and exit the editor. #!/usr/bin/env python import socket import ...
To reverse this encoding process, parse_qs() and parse_qsl() are provided in this module to parse query strings into Python data structures. Refer to urllib examples to find out how the urllib.parse.urlencode() method can be used for generating the query string of a URL or data for a POST request.
The return value is a named tuple, which means that its items can be accessed by index or as named attributes, which are: Reading the port attribute will raise a ValueError if an invalid port is specified in the URL. See section Structured Parse Results for more information on the result object.
Well, this is Python, with batteries included. You have mention that the format is the standard one used in URIs, so how about urllib.parse
?
import urllib.parse
def parse_hostport(hp):
# urlparse() and urlsplit() insists on absolute URLs starting with "//"
result = urllib.parse.urlsplit('//' + hp)
return result.hostname, result.port
This should handle any valid host:port
you can throw at it.
Came up with a dead simple regexp that seems to work in most cases:
def get_host_pair(value):
return re.search(r'^(.*?)(?::(\d+))?$', value).groups()
get_host_pair('localhost')
get_host_pair('localhost:80')
get_host_pair('[::1]')
get_host_pair('[::1]:8080')
It probably doesn't work when the base input is invalid however
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With