Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove Everything Between Two Characters As Long As They Aren't Inside Some Other Characters

Tags:

c#

.net

regex

Basically, my goal is to remove everything inside ()'s except for strings that are inside "".

I was following the code here: Remove text in-between delimiters in a string (using a regex?)

And that works great; but I have the additional requirement of not removing ()s if they are in "". Is that something that can be done with a regular expression. I feel like I'm dangerously close to needing another approach like a true parser.

This is the what I've been using....

string RemoveBetween(string s, char begin, char end)
{
    Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
    return regex.Replace(s, string.Empty);
}
like image 851
Rob P. Avatar asked Jun 05 '11 23:06

Rob P.


2 Answers

.NET regexes are even more powerful than the usual and you can surely do what you want. Take a look at this, which looks for balanced parentheses, which is essentially the same problem as yours but with parentheses and not quotes.

http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

like image 181
Mark Sowul Avatar answered Sep 27 '22 18:09

Mark Sowul


I don't speak C, but here's the java implementation:

input.replaceAll("(?<=\\().*?(?=[\"()])(\"([^\"]*)\")?.*(?=\\))", "$2");

This produces the following results:

"foo (bar \"hello world\" foo) bar" --> "foo (hello world) bar"
"foo (bar foo) bar" --> "foo () bar"

It wasn't clear whether you wanted to preserve the quotes - if you did, use $1 instead of $2

Now that you've got the working regex, you should be able to make it work for you in C.

like image 24
Bohemian Avatar answered Sep 27 '22 17:09

Bohemian