Function in Python to clean up and normalize a URL

Question

I am using URLs as a key so I need them to be consistent and clean. I need a python function that will take a URL and clean it up so that I can do a get from the DB. For example, it will take the following:

example.com
example.com/
http://example.com/
http://example.com
http://example.com?
http://example.com/?
http://example.com//

and output a clean consistent version:

http://example.com/

I looked through std libs and GitHub and couldn't find anything like this

Update

I couldn't find a Python library that implements everything discussed here and in the RFC:

http://en.wikipedia.org/wiki/URL_normalization

So I am writing one now. There is a lot more to this than I initially imagined.

samplebias · Accepted Answer

Take a look at urlparse.urlparse(). I've had good success with it.

note: This answer is from 2011 and is specific to Python2. In Python3 the urlparse module has been named to urllib.parse. The corresponding Python3 documentation for urllib.parse can be found here:

https://docs.python.org/3/library/urllib.parse.html

Function in Python to clean up and normalize a URL

Tags:

python

url

nikcub

1 Answers

samplebias

Recent Activity

Donate For Us

Function in Python to clean up and normalize a URL

Tags:

python

url

nikcub

1 Answers

samplebias

Related questions

Recent Activity

Donate For Us