I am using the following code:
downloadlink = re.findall("http://uploadir.com/u/(.*)\b", str(downloadhtml))
However, when I pass it the following string:
<input type="text" value="http://uploadir.com/u/bb41c5b3" />
It finds nothing, when I'm expecting it to find: http://uploadir.com/u/bb41c5b3
. What am I doing wrong?
I have tested the regex using http://gskinner.com/RegExr/ and it seems to be correct. Am I missing something here?
Python has a module named re to work with regular expressions. To use it, we need to import the module. The module defines several functions and constants to work with RegEx.
The Python "re" module provides regular expression support.
Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.
Get in the habit of making all regex patterns with raw strings:
In [16]: re.findall("http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[16]: []
In [17]: re.findall(r"http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[17]: ['bb41c5b3']
The difference is due to \b
being interpreted differently:
In [18]: '\b'
Out[18]: '\x08'
In [19]: r'\b'
Out[19]: '\\b'
'\b'
is an ASCII Backspace, while r'\b'
is a string composed of the two characters, a backslash and a b.
>>> import re
>>> html = '<input type="text" value="http://uploadir.com/u/bb41c5b3" />';
>>> regex = r'http://uploadir.com/u/([^"]+)'
>>> link = re.findall(regex, html)
>>> link
['bb41c5b3']
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With