I have urls formatted as:
google.com
www.google.com
http://google.com
http://www.google.com
I would like to convert all type of links to a uniform format, starting with http://
http://google.com
How can I prepend URLs with http://
using Python?
Python do have builtin functions to treat that correctly, like
p = urlparse.urlparse(my_url, 'http')
netloc = p.netloc or p.path
path = p.path if p.netloc else ''
if not netloc.startswith('www.'):
netloc = 'www.' + netloc
p = urlparse.ParseResult('http', netloc, path, *p[3:])
print(p.geturl())
If you want to remove (or add) the www
part, you have to edit the .netloc
field of the resulting object before calling .geturl()
.
Because ParseResult
is a namedtuple, you cannot edit it in-place, but have to create a new object.
PS:
For Python3, it should be urllib.parse.urlparse
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With