Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for removing data in parenthesis

I'm trying to remove the parenthesis areas of these strings below, but I can't get a regex working :(

Data:

x (LOC)
ds ds (32C)
d'ds ds (LeC)
ds-d da(LOQ)
12345 (deC)

Regex tried:

[ \(\w+\)]

Regex101:

http://regex101.com/r/bD8fE2

Example code

items = ["x (LOC)", "ds ds (32C)", "d'ds ds (LeC)", "ds-d da(LOQ)", "12345 (deC)"]
for item in items:
    item = re.sub(r"[ \(\w+\)]", "", item)
    print item
like image 249
Dennis Sylvian Avatar asked Nov 05 '13 16:11

Dennis Sylvian


People also ask

How do you escape parentheses in regular expression?

Since parentheses are also used for capturing and non-capturing groups, we have to escape the opening parenthesis with a backslash. An explanation of how literalRegex works: / — Opens or begins regex. \( — Escapes a single opening parenthesis literal.

Can you use parentheses in regex?

By placing part of a regular expression inside round brackets or parentheses, you can group that part of the regular expression together. This allows you to apply a quantifier to the entire group or to restrict alternation to part of the regex. Only parentheses can be used for grouping.

What is difference [] and () in regex?

In other words, square brackets match exactly one character. (a-z0-9) will match two characters, the first is one of abcdefghijklmnopqrstuvwxyz , the second is one of 0123456789 , just as if the parenthesis weren't there. The () will allow you to read exactly which characters were matched.


2 Answers

Remove the square brackets; you are not matching a character class:

item = re.sub(r" \(\w+\)", "", item)

Demo:

>>> items = ["x (LOC)", "ds ds (32C)", "d'ds ds (LeC)", "ds-d da(LOQ)", "12345 (deC)"]
>>> for item in items:
...     print re.sub(r" \(\w+\)", "", item)
... 
x
ds ds
d'ds ds
ds-d da(LOQ)
12345

The one-but last example has no space preceding the opening parenthesis (() and thus doesn't match. You could make the space optional if you need that pattern to work too:

item = re.sub(r" ?\(\w+\)", "", item)

Perhaps matching anything that isn't a closing parenthesis would work for you as well:

item = re.sub(r" ?\([^)]+\)", "", item)

This matches a wider range of characters than just \w.

In a regular expression, square brackets, [...], denote a character class; a set of characters to match once. The class [ \(w+\)] means: match one character, if it matches the set including a space, an opening parenthesis, all characters of the \w class, a + plus, or a closing parenthesis.

like image 161
Martijn Pieters Avatar answered Oct 12 '22 22:10

Martijn Pieters


Anything within square brackets are taken irrespective of the order in which you have the characters because [ ... ] is a character class. remove them entirely:

r" \(\w+\)"

And I would add a ? for an optional space:

r" ?\(\w+\)"

regex101 demo

like image 30
Jerry Avatar answered Oct 13 '22 00:10

Jerry