I can be given a string in any of these formats:
url: e.g http://www.acme.com:456
string: e.g www.acme.com:456, www.acme.com 456, or www.acme.com
I would like to extract the host and if present a port. If the port value is not present I would like it to default to 80.
I have tried urlparse, which works fine for the url, but not for the other format. When I use urlparse on hostname:port for example, it puts the hostname in the scheme rather than netloc.
I would be happy with a solution that uses urlparse and a regex, or a single regex that could handle both formats.
URL parsing is a function of traffic management and load-balancing products that scan URLs to determine how to forward traffic across different links or into different servers. A URL includes a protocol identifier (http, for Web traffic) and a resource name, such as www.microsoft.com.
Java URL getHost() MethodThe getHost() method of URL class returns the hostname of the URL. This method will return the IPv6 address enclosed in square brackets ('['and']').
Method 1: In this method, we will use createElement() method to create a HTML element, anchor tag and then use it for parsing the given URL. Method 2: In this method we will use URL() to create a new URL object and then use it for parsing the provided URL.
You can use urlparse to get hostname from URL string:
from urlparse import urlparse print urlparse("http://www.website.com/abc/xyz.html").hostname # prints www.website.com
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With