Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python: regex only gets the last occurrence

Tags:

python

regex

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import re

text = "aaaa[ab][cd][ef]"

a = re.compile("^(\w+)(\[\w+\])*$").findall(text)

print a

i need all of them but it returns:

[('aaaa', '[ef]')]

with:

a = re.compile("\[\w+\]").findall(text)

i get all of them but the first word is out...

['[ab]', '[cd]', '[ef]']

this text is random text i put this because of the stackoverflow standars quality

like image 989
ZiTAL Avatar asked Feb 01 '12 22:02

ZiTAL


3 Answers

Here is how you can do it:

In [14]: a = re.compile(r"(\w+|\[\w+\])").findall(text)

In [15]: print a
['aaaa', '[ab]', '[cd]', '[ef]']

Each match returns one group of letters (with or without brackets).

like image 189
NPE Avatar answered Nov 01 '22 18:11

NPE


There is only one match: the "^(\w+)" part matches "aaaa" and the "(\[\w+\])*$" part matches "[ab][cd][ef]". Note that you get a list of one element (which is a tuple), so there's only one match. Each pair of parentheses you use in the regexp generates an element in the tuple, with the text that matched whatever was inside them. There are two pairs, so there are two elements in the tuple. The second pair of parentheses is starred, but that only causes that result to be "assigned" multiple times (which appears to keep the last value): it does not multiply the parentheses themselves, so you don't get a larger tuple.

I'm not sure what you expect, so I don't know what regexp to suggest.

like image 36
cvoinescu Avatar answered Nov 01 '22 18:11

cvoinescu


Based on your comment on aix's answer it appears that you want to require the non-bracketed part to match, maybe something like this is what you are looking for?

>>> a = re.compile(r"^(\w+)((?:\[\w+\])*)").findall(text)
>>> print a
[('aaaa', '[ab][cd][ef]')]

If you need to get the result ['aaaa', '[ab]', '[cd]', '[ef]'] instead of what is shown above here is one method:

>>> match = re.compile(r"^(\w+)((?:\[\w+\])*)").search(text)
>>> a = [match.group(1)] + match.group(2).replace("][", "] [").split()
>>> print a
['aaaa', '[ab]', '[cd]', '[ef]']
like image 24
Andrew Clark Avatar answered Nov 01 '22 19:11

Andrew Clark