Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regular expression again - match url

Tags:

python

regex

I have such regexp:

 re.compile(r"((https?):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", re.MULTILINE|re.UNICODE)

But that doesn't include hashbangs (#!). What I need to change, to get it working? I know I can add ! to group with #@% etc, but that will select something like

Check this out: http://example.com/something/!!!

and I want to avoid that.

like image 936
ThomK Avatar asked Jul 16 '11 16:07

ThomK


2 Answers

Don't try to make your own regular expression for matching URLs, use someone else's who has already solved such problems, like this one.

like image 88
kindall Avatar answered Oct 01 '22 21:10

kindall


It could be very long but in practice mine works pretty good. Please try this one ((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*

It matches all of the example below

http://wwww.stackoverflow.com
abc.com
http://test.test-75.1474.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
[email protected]
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:[email protected]/etcetc
(www.itmag.com)
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-
match-url-with
www/[email protected]
[email protected].
[email protected] 
[email protected]     
like image 38
Asad Avatar answered Oct 01 '22 23:10

Asad