Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a web address [duplicate]

So I'm using python to do some parsing of web pages and I want to split the full web address into two parts. Say I have the address http://www.stackoverflow.com/questions/ask. I would need the protocol and domain (e.g. http://www.stackoverflow.com) and the path (e.g. /questions/ask). I figured this might be solved by some regex, however I'm not so handy with that. Any suggestions?

like image 988
The.Anti.9 Avatar asked Dec 05 '25 06:12

The.Anti.9


1 Answers

Dan is right: urlparse is your friend:

>>> from urlparse import urlparse
>>>
>>> parts = urlparse("http://www.stackoverflow.com/questions/ask")
>>> parts.scheme + "://" + parts.netloc
'http://www.stackoverflow.com'
>>> parts.path
'/questions/ask'

Note: In Python 3 it's from urllib.parse import urlparse

like image 79
Ned Batchelder Avatar answered Dec 07 '25 22:12

Ned Batchelder