Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Gruber’s URL Regular Expression in Python

How do I rewrite this new way to recognise addresses to work in Python?

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

like image 390
Tobias Avatar asked Dec 31 '09 16:12

Tobias


People also ask

How do you match a URL in Python?

To find the URLs in a given string we have used the findall() function from the regular expression module of Python. This return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

Is RegEx a Pythonic?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.

Is RegEx faster than string replace Python?

The regexes would probably be faster. A good regex engine (and Python has a good one) is a very fast way to do the sorts of string transformations it can handle. Unless you're really good with regexes though, it will be a bit harder to understand.


2 Answers

The original source for that states "This pattern should work in most modern regex implementations" and specifically Perl. Python's regex implementation is modern and similar to Perl's but is missing the [:punct:] character class. You can easily build that using this:

>>> import string, re
>>> pat = r'\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^%s\s]|/)))'
>>> pat = pat % re.sub(r'([-\\\]])', r'\\\1', string.punctuation)

The re.sub() call escapes certain characters inside the character set as required.

Edit: Using re.escape() works just as well, since it just sticks a backslash in front of everything. That felt crude to me at first, but certainly works fine for this case.

>>> pat = pat % re.escape(string.punctuation)
like image 102
Peter Hansen Avatar answered Oct 06 '22 00:10

Peter Hansen


I don't think python have this expression

[:punct:]

Wikipedia says [:punct:] is same to

[-!\"#$%&\'()*+,./:;<=>?@\\[\\\\]^_`{|}~]
like image 34
YOU Avatar answered Oct 05 '22 23:10

YOU