#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
text = "aaaa[ab][cd][ef]"
a = re.compile("^(\w+)(\[\w+\])*$").findall(text)
print a
i need all of them but it returns:
[('aaaa', '[ef]')]
with:
a = re.compile("\[\w+\]").findall(text)
i get all of them but the first word is out...
['[ab]', '[cd]', '[ef]']
this text is random text i put this because of the stackoverflow standars quality
Here is how you can do it:
In [14]: a = re.compile(r"(\w+|\[\w+\])").findall(text)
In [15]: print a
['aaaa', '[ab]', '[cd]', '[ef]']
Each match returns one group of letters (with or without brackets).
There is only one match: the "^(\w+)"
part matches "aaaa"
and the "(\[\w+\])*$"
part matches "[ab][cd][ef]"
. Note that you get a list of one element (which is a tuple), so there's only one match. Each pair of parentheses you use in the regexp generates an element in the tuple, with the text that matched whatever was inside them. There are two pairs, so there are two elements in the tuple. The second pair of parentheses is starred, but that only causes that result to be "assigned" multiple times (which appears to keep the last value): it does not multiply the parentheses themselves, so you don't get a larger tuple.
I'm not sure what you expect, so I don't know what regexp to suggest.
Based on your comment on aix's answer it appears that you want to require the non-bracketed part to match, maybe something like this is what you are looking for?
>>> a = re.compile(r"^(\w+)((?:\[\w+\])*)").findall(text)
>>> print a
[('aaaa', '[ab][cd][ef]')]
If you need to get the result ['aaaa', '[ab]', '[cd]', '[ef]']
instead of what is shown above here is one method:
>>> match = re.compile(r"^(\w+)((?:\[\w+\])*)").search(text)
>>> a = [match.group(1)] + match.group(2).replace("][", "] [").split()
>>> print a
['aaaa', '[ab]', '[cd]', '[ef]']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With