<p>I am using the following code:</p> <pre class="prettyprint"><code>downloadlink = re.findall("http://uploadir.com/u/(.*)\b", str(downloadhtml)) </code></pre> <p>However, when I pass it the following string:</p> <pre class="prettyprint"><code><input type="text" value="http://uploadir.com/u/bb41c5b3" /> </code></pre> <p>It finds nothing, when I'm expecting it to find: <code>http://uploadir.com/u/bb41c5b3</code>. What am I doing wrong?</p> <p>I have tested the regex using http://gskinner.com/RegExr/ and it seems to be correct. Am I missing something here?</p>

<p>Get in the habit of making all regex patterns with raw strings:</p> <pre class="prettyprint"><code>In [16]: re.findall("http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />') Out[16]: [] In [17]: re.findall(r"http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />') Out[17]: ['bb41c5b3'] </code></pre> <p>The difference is due to <code>\b</code> being interpreted differently:</p> <pre class="prettyprint"><code>In [18]: '\b' Out[18]: '\x08' In [19]: r'\b' Out[19]: '\\b' </code></pre> <p><code>'\b'</code> is an ASCII Backspace, while <code>r'\b'</code> is a string composed of the two characters, a backslash and a b.</p>

<pre class="prettyprint"><code>>>> import re >>> html = '<input type="text" value="http://uploadir.com/u/bb41c5b3" />'; >>> regex = r'http://uploadir.com/u/([^"]+)' >>> link = re.findall(regex, html) >>> link ['bb41c5b3'] >>> </code></pre>

Python regex not working

Tags:

python

regex

I am using the following code:

downloadlink = re.findall("http://uploadir.com/u/(.*)\b", str(downloadhtml))

However, when I pass it the following string:

<input type="text" value="http://uploadir.com/u/bb41c5b3" />

It finds nothing, when I'm expecting it to find: http://uploadir.com/u/bb41c5b3. What am I doing wrong?

I have tested the regex using http://gskinner.com/RegExr/ and it seems to be correct. Am I missing something here?

303

asked Jan 15 '11 17:01

matthewgall

2 Answers

Get in the habit of making all regex patterns with raw strings:

In [16]: re.findall("http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[16]: []

In [17]: re.findall(r"http://uploadir.com/u/(.*)\b", '<input type="text" value="http://uploadir.com/u/bb41c5b3" />')
Out[17]: ['bb41c5b3']

The difference is due to \b being interpreted differently:

In [18]: '\b'
Out[18]: '\x08'

In [19]: r'\b'
Out[19]: '\\b'

'\b' is an ASCII Backspace, while r'\b' is a string composed of the two characters, a backslash and a b.

answered Oct 20 '22 08:10

unutbu

>>> import re
>>> html = '<input type="text" value="http://uploadir.com/u/bb41c5b3" />';
>>> regex = r'http://uploadir.com/u/([^"]+)'
>>> link = re.findall(regex, html)
>>> link
['bb41c5b3']
>>>

answered Oct 20 '22 07:10

joksnet

Related questions
                            
                                How would you translate this from Perl to Python?
                            
                                Which template technology should I use with CherryPy? [closed]
                            
                                Convert PyQt to PIL image
                            
                                Python in Vim buffer? [duplicate]
                            
                                Python attribute error: type object '_socketobject' has no attribute 'gethostbyname'
                            
                                Django: vps or shared hosting? [closed]
                            
                                Python: recursively create dictionary from paths
                            
                                extend Python namedtuple with many @properties?
                            
                                How do you execute a server-side Python script using jQuery?
                            
                                Hudson "Source code is unavailable."
                            
                                How to spell check python docstring with emacs?
                            
                                Using __str__ representation for printing objects in containers
                            
                                When to use "property" builtin: auxiliary functions and generators
                            
                                How do I get list of all Python types (programmatically)?
                            
                                Why is Python saying pow only has 2 arguments
                            
                                Understanding Python daemon threads
                            
                                Twisted Deferred.addCallBack() vs. yield and @inlineDeferred
                            
                                Is there a more pythonic way to find the point in a list which is closest to another point?
                            
                                Loop with conditions in python
                            
                                has Python 3 been widely adopted yet?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With