Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the cleanest way to replace a hostname in an URL with Python?

In Python, there is a standard library module urllib.parse that deals with parsing URLs:

>>> import urllib.parse
>>> urllib.parse.urlparse("https://127.0.0.1:6443")
ParseResult(scheme='https', netloc='127.0.0.1:6443', path='', params='', query='', fragment='')

There are also properties on urllib.parse.ParseResult that return the hostname and the port:

>>> p.hostname
'127.0.0.1'
>>> p.port
6443

And, by virtue of ParseResult being a namedtuple, it has a _replace() method that returns a new ParseResult with the given field(s) replaced:

>>> p._replace(netloc="foobar.tld")
ParseResult(scheme='https', netloc='foobar.tld', path='', params='', query='', fragment='')

However, it cannot replace hostname or port because they are dynamic properties rather than fields of the tuple:

>>> p._replace(hostname="foobar.tld")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.11/collections/__init__.py", line 455, in _replace
    raise ValueError(f'Got unexpected field names: {list(kwds)!r}')
ValueError: Got unexpected field names: ['hostname']

It might be tempting to simply concatenate the new hostname with the existing port and pass it as the new netloc:

>>> p._replace(netloc='{}:{}'.format("foobar.tld", p.port))
ParseResult(scheme='https', netloc='foobar.tld:6443', path='', params='', query='', fragment='')

However this quickly turns into a mess if we consider

  • the fact that the port is optional;
  • the fact that netloc may also contain the username and possibly the password (e.g. https://user:[email protected]);
  • the fact that IPv6 literals must be wrapped in brackets (i.e. https://::1 isn't valid but https://[::1] is);
  • and maybe something else that I'm missing.

What is the cleanest, correct way to replace the hostname in a URL in Python?

The solution must handle IPv6 (both as a part of the original URL and as the replacement value), URLs containing username/password, and in general all well-formed URLs.

(There is a wide assortment of existing posts that try to ask the same question, but none of them ask for (or provide) a solution that fits all of the criteria above.)

like image 964
intelfx Avatar asked Dec 29 '25 10:12

intelfx


1 Answers

Nice nerd snipe. Quite difficult to get right.

import urllib.parse
import socket

def is_ipv6(s):
    try:
        socket.inet_pton(socket.AF_INET6, s)
    except Exception:
        return False
    else:
        return True

def host_replace(url, new_host):
    parsed = urllib.parse.urlparse(url)
    _, _, host = parsed.netloc.rpartition("@")
    _, sep, bracketed = host.partition("[")
    if sep:
        host, _, _ = bracketed.partition("]")
        ipv6 = True
    else:
        # ipv4 - might have port suffix
        host, _, _ = host.partition(':')
        ipv6 = False
    new_ipv6 = is_ipv6(new_host)
    if ipv6 and not new_ipv6:
        host = f"[{host}]"
    elif not ipv6 and new_ipv6:
        new_host = f"[{new_host}]"
    port = parsed.port
    netloc = parsed.netloc
    if port is not None:
        netloc = netloc.removesuffix(f":{port}")
    left, sep, right = netloc.rpartition(host)
    new_netloc = left + new_host + right
    if port is not None:
        new_netloc += f":{port}"
    new_url = parsed._replace(netloc=new_netloc).geturl()
    return new_url

I also include my test-cases:

tests = [
    ("https://x.com", "example.org", "https://example.org"),
    ("https://X.com", "example.org", "https://example.org"),
    ("https://x.com/", "example.org", "https://example.org/"),
    ("https://x.com/i.html", "example.org", "https://example.org/i.html"),
    ("https://x.com:8888", "example.org", "https://example.org:8888"),
    ("https://[email protected]:8888", "example.org", "https://[email protected]:8888"),
    ("https://u:[email protected]:8888", "example.org", "https://u:[email protected]:8888"),
    ("https://[::1]:1234", "example.org", "https://example.org:1234"),
    ("https://[::1]:1234", "::2", "https://[::2]:1234"),
    ("https://x.com", "::2", "https://[::2]"),
    ("http://u:p@80:80", "foo", "http://u:p@foo:80"),
]
for url, new_host, expect in tests:
    actual = host_replace(url, new_host)
    assert actual == expect, f"\n{actual=}\n{expect=}"
like image 149
wim Avatar answered Dec 31 '25 23:12

wim



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!