Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Matching Error

Tags:

python

regex

I am new to Python (I dont have any programming training either), so please keep that in mind as I ask my question.

I am trying to search a retrieved webpage and find all links using a specified pattern. I have done this successfully in other scripts, but I am getting an error that says

raise error, v # invalid expression

sre_constants.error: multiple repeat

I have to admit I do not know why, but again, I am new to Python and Regular Expressions. However, even when I don't use patterns and use a specific link (just to test the matching), I do not believe I return any matches (nothing is sent to the window when I print match.group(0). The link I tested is commented out below.

Any ideas? It usually is easier for me to learn by example, but any advice you can give is greatly appreciated!

Brock

import urllib2
from BeautifulSoup import BeautifulSoup
import re

url = "http://forums.epicgames.com/archive/index.php?f-356-p-164.html"
page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)

pattern = r'<a href="http://forums.epicgames.com/archive/index.php?t-([0-9]+).html">(.?+)</a> <i>((.?+) replies)'
#pattern = r'href="http://forums.epicgames.com/archive/index.php?t-622233.html">Gears of War 2: Horde Gameplay</a> <i>(20 replies)'

for match in re.finditer(pattern, page, re.S):
    print match(0)
like image 236
Btibert3 Avatar asked Aug 12 '09 21:08

Btibert3


People also ask

How do I use regex to match?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

Is regex a match in Python?

match() function of re in Python will search the regular expression pattern and return the first occurrence. The Python RegEx Match method checks for a match only at the beginning of the string. So, if a match is found in the first line, it returns the match object.

What does matching mean in regex?

Matches(String, Int32) Searches the specified input string for all occurrences of a regular expression, beginning at the specified starting position in the string. Matches(String) Searches the specified input string for all occurrences of a regular expression.

Is regex matching expensive?

Avoid coding in regex if you can In programming, only use regular expressions as a last resort. Don't solve important problems with regex. regex is expensive – regex is often the most CPU-intensive part of a program. And a non-matching regex can be even more expensive to check than a matching one.


2 Answers

That means your regular expression has an error.

(.?+)</a> <i>((.?+)

What does ?+ mean? Both ? and + are meta characters that does not make sense right next to each other. Maybe you forgot to escape the '?' or something.

like image 133
Unknown Avatar answered Sep 23 '22 02:09

Unknown


You need to escape the literal '?' and the literal '(' and ')' that you are trying to match.

Also, instead of '?+', I think you're looking for the non-greedy matching provided by '+?'.

More documentation here.

For your case, try this:

pattern = r'<a href="http://forums.epicgames.com/archive/index.php\?t-([0-9]+).html"> (.+?)</a> <i>\((.+?) replies\)'
like image 42
retracile Avatar answered Sep 23 '22 02:09

retracile