I need to extract all letters after the +
sign or at the beginning of a string like this:
formula = "X+BC+DAF"
I tried so, and I do not want to see the +
sign in the result. I wish see only ['X', 'B', 'D']
.
>>> re.findall("^[A-Z]|[+][A-Z]", formula)
['X', '+B', '+D']
When I grouped with parenthesis, I got this strange result:
re.findall("^([A-Z])|[+]([A-Z])", formula)
[('X', ''), ('', 'B'), ('', 'D')]
Why it created tuples when I try to group ? How to write the regexp directly such that it returns ['X', 'B', 'D']
?
If there are any capturing groups in the regular expression then re.findall
returns only the values captured by the groups. If there are no groups the entire matched string is returned.
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.
How to write the regexp directly such that it returns ['X', 'B', 'D'] ?
Instead of using a capturing group you can use a non-capturing group:
>>> re.findall(r"(?:^|\+)([A-Z])", formula)
['X', 'B', 'D']
Or for this specific case you could try a simpler solution using a word boundary:
>>> re.findall(r"\b[A-Z]", formula)
['X', 'B', 'D']
Or a solution using str.split
that doesn't use regular expressions:
>>> [s[0] for s in formula.split('+')]
['X', 'B', 'D']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With