I ran into a small problem using Python Regex.
Suppose this is the input:
(zyx)bc
What I'm trying to achieve is obtain whatever is between parentheses as a single match, and any char outside as an individual match. The desired result would be along the lines of:
['zyx','b','c']
The order of matches should be kept.
I've tried obtaining this with Python 3.3, but can't seem to figure out the correct Regex. So far I have:
matches = findall(r'\((.*?)\)|\w', '(zyx)bc')
print(matches)
yields the following:
['zyx','','']
Any ideas what I'm doing wrong?
We can write both variants in a regexp using alternation: [01]\d|2[0-3] . Next, minutes must be from 00 to 59 . In the regular expression language that can be written as [0-5]\d : the first digit 0-5 , and then any digit. If we glue hours and minutes together, we get the pattern: [01]\d|2[0-3]:[0-5]\d .
The Alternation Operator ( | or \| ) Alternatives match one of a choice of regular expressions: if you put the character(s) representing the alternation operator between any two regular expressions a and b , the result matches the union of the strings that a and b match.
Using special characters For example, to match a single "a" followed by zero or more "b" s followed by "c" , you'd use the pattern /ab*c/ : the * after "b" means "0 or more occurrences of the preceding item."
Use | (pipe) operator to specify multiple patterns.
From the documentation of re.findall
:
If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.
While your regexp is matching the string three times, the (.*?)
group is empty for the second two matches. If you want the output of the other half of the regexp, you can add a second group:
>>> re.findall(r'\((.*?)\)|(\w)', '(zyx)bc')
[('zyx', ''), ('', 'b'), ('', 'c')]
Alternatively, you could remove all the groups to get a simple list of strings again:
>>> re.findall(r'\(.*?\)|\w', '(zyx)bc')
['(zyx)', 'b', 'c']
You would need to manually remove the parentheses though.
Other answers have shown you how to get the result you need, but with the extra step of manually removing the parentheses. If you use lookarounds in your regex, you won't need to strip the parentheses manually:
>>> import re
>>> s = '(zyx)bc'
>>> print (re.findall(r'(?<=\()\w+(?=\))|\w', s))
['zyx', 'b', 'c']
Explained:
(?<=\() // lookbehind for left parenthesis
\w+ // all characters until:
(?=\)) // lookahead for right parenthesis
| // OR
\w // any character
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With