Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex not working

Tags:

python

regex

I am using the following code:

downloadlink = re.findall("http://uploadir.com/u/(.*)\b", str(downloadhtml))

However, when I pass it the following string:

<input type="text" value="http://uploadir.com/u/bb41c5b3" />

It finds nothing, when I'm expecting it to find: http://uploadir.com/u/bb41c5b3. What am I doing wrong?

I have tested the regex using http://gskinner.com/RegExr/ and it seems to be correct. Am I missing something here?

like image 303
matthewgall Avatar asked Jan 15 '11 17:01

matthewgall


People also ask

Does regex work Python?

Python has a module named re to work with regular expressions. To use it, we need to import the module. The module defines several functions and constants to work with RegEx.

Which Python module supports regex?

The Python "re" module provides regular expression support.

Does Python replace work with regex?

Regex can be used to perform various tasks in Python. It is used to do a search and replace operations, replace patterns in text, check if a string contains the specific pattern.


2 Answers

Get in the habit of making all regex patterns with raw strings:

In [16]: re.findall("http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[16]: []

In [17]: re.findall(r"http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[17]: ['bb41c5b3']

The difference is due to \b being interpreted differently:

In [18]: '\b'
Out[18]: '\x08'

In [19]: r'\b'
Out[19]: '\\b'

'\b' is an ASCII Backspace, while r'\b' is a string composed of the two characters, a backslash and a b.

like image 76
unutbu Avatar answered Oct 20 '22 08:10

unutbu


>>> import re
>>> html = '<input type="text" value="http://uploadir.com/u/bb41c5b3" />';
>>> regex = r'http://uploadir.com/u/([^"]+)'
>>> link = re.findall(regex, html)
>>> link
['bb41c5b3']
>>> 
like image 22
joksnet Avatar answered Oct 20 '22 07:10

joksnet