Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression that matches the third instance of something? (python)

Tags:

python

regex

I'm trying to create a regular expression that will match the third instance of a / in a url, i.e. so that only the website's name itself will be recorded, nothing else.

So http://www.stackoverflow.com/questions/answers/help/ after being put through the regex will be http://www.stackoverflow.com

I've been playing about with them myself and come up with:

base_url = re.sub(r'[/].*', r'', url)

but all this does is reduce a link to http: - so it's obvious I need to match the third instance of / - can anyone explain how I would do this?

Thanks!

like image 912
Jingo Avatar asked Apr 26 '26 10:04

Jingo


1 Answers

I suggest you use urlparse for parsing URLs:

In [1]: from urlparse import urlparse

In [2]: urlparse('http://www.stackoverflow.com/questions/answers/help/').netloc
Out[2]: 'www.stackoverflow.com'

.netloc includes the port number if present (e.g. www.stackoverflow.com:80); if you don't want the port number, use .hostname instead.

like image 52
NPE Avatar answered Apr 29 '26 01:04

NPE



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!