Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby regex for text within parentheses

Tags:

regex

ruby

I am looking for a regex to replace all terms in parentheses unless the parentheses are within square brackets.

e.g.

(matches) #match
[(do not match)] #should not match
[[does (not match)]] #should not match

I current have:

[^\]]\([^()]*\) #Not a square bracket, an opening bracket, any non-bracket character and a closing bracket.

However this is still matching words within the square brackets.

I have also created a rubular page of my progress so far: http://rubular.com/r/gG22pFk2Ld

like image 669
Gazler Avatar asked May 23 '11 18:05

Gazler


2 Answers

A regex is not going to cut it for you if you can nest the square brackets (see this related question).

I think you can only do this with a regex if (a) you only allow one level of square brackets and (b) you assume all square brackets are properly matched. In that case

\([^()]*\)(?![^\[]*])

is sufficient - it matches any parenthesised expression not followed by an unpaired ]. You need (b) because of the limitations of negative lookbehind (only fixed length strings in 1.9, and not allowed at all in 1.8), which mean you are stuck matching (match)] even if you don't want to.

So basically if you need to nest, or to allow unmatched brackets, you should ditch the regex and look at the answer to the question I linked to above.

like image 164
Andrew Haines Avatar answered Nov 09 '22 02:11

Andrew Haines


This is a type of expression you cannot parse using a pure-regex approach, because you need to keep track of the current nesting/state_if_in_square_bracket (so you don't have a type 3 language anymore).

However, depending on the exact circumstances, you can parse it with multiple regexes or simple parsers. Example approaches:

  • Split into sub-strings, delimited by [/[[or ]/]], change the state when such a square bracket is encountered, replace () in a sub-string if in "not_in_square_bracket" state
  • Parse for square brackets (including content), remove & remember them (these are "comments"), now replace all the content in normal brackets and re-add the square brackets stuff (you can remember stuff by using unique temp strings)

The complexity of your solution also depends on the detail if escaping ] is allowed.

like image 28
J-_-L Avatar answered Nov 09 '22 02:11

J-_-L