Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing hostname and port from string or url

I can be given a string in any of these formats:

  • url: e.g http://www.acme.com:456

  • string: e.g www.acme.com:456, www.acme.com 456, or www.acme.com

I would like to extract the host and if present a port. If the port value is not present I would like it to default to 80.

I have tried urlparse, which works fine for the url, but not for the other format. When I use urlparse on hostname:port for example, it puts the hostname in the scheme rather than netloc.

I would be happy with a solution that uses urlparse and a regex, or a single regex that could handle both formats.

like image 308
TonyM Avatar asked Mar 02 '12 09:03

TonyM


People also ask

What is parsing a URL?

URL parsing is a function of traffic management and load-balancing products that scan URLs to determine how to forward traffic across different links or into different servers. A URL includes a protocol identifier (http, for Web traffic) and a resource name, such as www.microsoft.com.

How do I find the hostname for my URL?

Java URL getHost() MethodThe getHost() method of URL class returns the hostname of the URL. This method will return the IPv6 address enclosed in square brackets ('['and']').

How do you parse a link?

Method 1: In this method, we will use createElement() method to create a HTML element, anchor tag and then use it for parsing the given URL. Method 2: In this method we will use URL() to create a new URL object and then use it for parsing the provided URL.


1 Answers

You can use urlparse to get hostname from URL string:

from urlparse import urlparse print urlparse("http://www.website.com/abc/xyz.html").hostname # prints www.website.com 
like image 196
Maksym Kozlenko Avatar answered Oct 13 '22 23:10

Maksym Kozlenko