Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python regex - Ignore parenthesis as indexing?

I've currently written a nooby regex pattern which involves excessive use of the "(" and ")" characters, but I'm using them for 'or' operators, such as (A|B|C) meaning A or B or C.

I need to find every match of the pattern in a string.
Trying to use the re.findall(pattern, text) method is no good, since it interprets the parenthesis characters as indexing signifiers (or whatever the correct jargon be), and so each element of the produced List is not a string showing the matched text sections, but instead is a tuple (which contain very ugly snippets of pattern match).

Is there an argument I can pass to findall to ignore paranthesis as indexing?
Or will I have to use a very ugly combination of re.search, and re.sub

(This is the only solution I can think of; Find the index of the re.search, add the matched section of text to the List then remove it from the original string {by using ugly index tricks}, continuing this until there's no more matches. Obviously, this is horrible and undesirable).

Thanks!

like image 973
Anti Earth Avatar asked Aug 16 '12 10:08

Anti Earth


Video Answer


1 Answers

Yes, add ?: to a group to make it non-capturing.

import re
print re.findall('(.(foo))', "Xfoo")   # [('Xfoo', 'foo')]
print re.findall('(.(?:foo))', "Xfoo") # ['Xfoo']

See re syntax for more information.

like image 104
georg Avatar answered Sep 18 '22 15:09

georg