Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse a host:port pair in Python

Tags:

python

Suppose I have a string of the of the format host:port, where :port is optional. How can I reliably extract the two components?

The host can be any of:

  • A hostname (localhost, www.google.com)
  • An IPv4 literal (1.2.3.4)
  • An IPv6 literal ([aaaa:bbbb::cccc]).

In other words, this is the standard format used across the internet (such as in URIs: complete grammar at https://www.rfc-editor.org/rfc/rfc3986#section-3.2, excluding the "User Information" component).

So, some possible inputs, and desired outputs:

'localhost' -> ('localhost', None)
'my-example.com:1234' -> ('my-example.com', 1234)
'1.2.3.4' -> ('1.2.3.4', None)
'[0abc:1def::1234]' -> ('[0abc:1def::1234]', None)
like image 321
richvdh Avatar asked Oct 22 '17 16:10

richvdh


People also ask

How do I parse a URL in Python?

The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string. Parse a URL into six components, returning a 6-item named tuple. This corresponds to the general structure of a URL: scheme://netloc/path;parameters?query#fragment .

How do I use a port scanner in Python?

This small port scanner program will try to connect on every port you define for a particular host. The first thing we must do is import the socket library and other libraries that we need. Open up an text editor, copy & paste the code below. Save the file as: “portscanner.py” and exit the editor. #!/usr/bin/env python import socket import ...

How to parse query strings Into Python data structures?

To reverse this encoding process, parse_qs() and parse_qsl() are provided in this module to parse query strings into Python data structures. Refer to urllib examples to find out how the urllib.parse.urlencode() method can be used for generating the query string of a URL or data for a POST request.

What is the return value of the port attribute in Python?

The return value is a named tuple, which means that its items can be accessed by index or as named attributes, which are: Reading the port attribute will raise a ValueError if an invalid port is specified in the URL. See section Structured Parse Results for more information on the result object.


2 Answers

Well, this is Python, with batteries included. You have mention that the format is the standard one used in URIs, so how about urllib.parse?

import urllib.parse

def parse_hostport(hp):
    # urlparse() and urlsplit() insists on absolute URLs starting with "//"
    result = urllib.parse.urlsplit('//' + hp)
    return result.hostname, result.port

This should handle any valid host:port you can throw at it.

like image 178
twisteroid ambassador Avatar answered Oct 30 '22 09:10

twisteroid ambassador


Came up with a dead simple regexp that seems to work in most cases:

def get_host_pair(value):
    return re.search(r'^(.*?)(?::(\d+))?$', value).groups()

get_host_pair('localhost')
get_host_pair('localhost:80')
get_host_pair('[::1]')
get_host_pair('[::1]:8080')

It probably doesn't work when the base input is invalid however

like image 41
Romuald Brunet Avatar answered Oct 30 '22 10:10

Romuald Brunet