I'd like to add the 'http' scheme name in front of a given url string if it's missing. Otherwise, leave the url alone so I thought urlparse was the right way to do this. But whenever there's no scheme and I use get url, I get /// instead of '//' between the scheme and domain.
>>> t = urlparse.urlparse('www.example.com', 'http')
>>> t.geturl()
'http:///www.example.com' # three ///
How do I convert this url so it actually looks like:
'http://www.example.com' # two //
The url. parse() method takes a URL string, parses it, and it will return a URL object with each part of the address as properties.
The urlsplit() function is an alternative to urlparse(). It behaves a little different, because it does not split the parameters from the URL. This is useful for URLs following RFC 2396, which supports parameters for each segment of the path.
The urlparse module contains functions to process URLs, and to convert between URLs and platform-specific filenames. Example 7-16 demonstrates. A common use is to split an HTTP URL into host and path components (an HTTP request involves asking the host to return data identified by the path), as shown in Example 7-17.
Short answer (but it's a bit tautological):
>>> urlparse.urlparse("http://www.example.com").geturl()
'http://www.example.com'
In your example code, the hostname is parsed as a path not a network location:
>>> urlparse.urlparse("www.example.com/go")
ParseResult(scheme='', netloc='', path='www.example.com/go', params='', \
query='', fragment='')
>>> urlparse.urlparse("http://www.example.com/go")
ParseResult(scheme='http', netloc='www.example.com', path='/go', params='', \
query='', fragment='')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With