I'm trying to remove the parenthesis areas of these strings below, but I can't get a regex working :(
x (LOC)
ds ds (32C)
d'ds ds (LeC)
ds-d da(LOQ)
12345 (deC)
[ \(\w+\)]
http://regex101.com/r/bD8fE2
items = ["x (LOC)", "ds ds (32C)", "d'ds ds (LeC)", "ds-d da(LOQ)", "12345 (deC)"]
for item in items:
item = re.sub(r"[ \(\w+\)]", "", item)
print item
Since parentheses are also used for capturing and non-capturing groups, we have to escape the opening parenthesis with a backslash. An explanation of how literalRegex works: / — Opens or begins regex. \( — Escapes a single opening parenthesis literal.
By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex. Only parentheses can be used for grouping.
In other words, square brackets match exactly one character. (a-z0-9) will match two characters, the first is one of abcdefghijklmnopqrstuvwxyz , the second is one of 0123456789 , just as if the parenthesis weren't there. The () will allow you to read exactly which characters were matched.
Remove the square brackets; you are not matching a character class:
item = re.sub(r" \(\w+\)", "", item)
Demo:
>>> items = ["x (LOC)", "ds ds (32C)", "d'ds ds (LeC)", "ds-d da(LOQ)", "12345 (deC)"]
>>> for item in items:
... print re.sub(r" \(\w+\)", "", item)
...
x
ds ds
d'ds ds
ds-d da(LOQ)
12345
The one-but last example has no space preceding the opening parenthesis ((
) and thus doesn't match. You could make the space optional if you need that pattern to work too:
item = re.sub(r" ?\(\w+\)", "", item)
Perhaps matching anything that isn't a closing parenthesis would work for you as well:
item = re.sub(r" ?\([^)]+\)", "", item)
This matches a wider range of characters than just \w
.
In a regular expression, square brackets, [...]
, denote a character class; a set of characters to match once. The class [ \(w+\)]
means: match one character, if it matches the set including a space, an opening parenthesis, all characters of the \w
class, a +
plus, or a closing parenthesis.
Anything within square brackets are taken irrespective of the order in which you have the characters because [ ... ]
is a character class. remove them entirely:
r" \(\w+\)"
And I would add a ?
for an optional space:
r" ?\(\w+\)"
regex101 demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With