I am new to python and was wondering if there was a better solution to match all forms of URLs that might be found in a given string. Upon googling, there seems to a lot of solutions that extract domains, replace it with links etc, but none that removes / deletes them from a string. I have mentioned some examples below for reference. Thanks!
str = 'this is some text that will have one form or the other url embeded, most will have valid URLs while there are cases where they can be bad. for eg, http://www.google.com and http://www.google.co.uk and www.domain.co.uk and etc.'
URLless_string = re.sub(r'(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|
(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))', '', thestring)
print '==' + URLless_string + '=='
Error Log:
C:\Python27>python test.py
File "test.py", line 7
SyntaxError: Non-ASCII character '\xab' in file test.py on line 7, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
sub(r'http\S+', '', my_string) . The re. sub() method will remove any URLs from the string by replacing them with empty strings.
To remove a hyperlink but keep the text, right-click the hyperlink and click Remove Hyperlink. To remove the hyperlink completely, select it and then press Delete.
URL extraction is achieved from a text file by using regular expression. The expression fetches the text wherever it matches the pattern. Only the re module is used for this purpose.
Include encoding line at the top of your source file(the regex string contains non-ascii symbols like »
), e.g.:
# -*- coding: utf-8 -*-
import re
...
Also surround your regex string in triple single(or double)quotes - '''
or """
instead of single as this string already contains quote symbols itself('
and "
).
r'''(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))'''
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With