Basically, my goal is to remove everything inside ()'s except for strings that are inside "".
I was following the code here: Remove text in-between delimiters in a string (using a regex?)
And that works great; but I have the additional requirement of not removing ()s if they are in "". Is that something that can be done with a regular expression. I feel like I'm dangerously close to needing another approach like a true parser.
This is the what I've been using....
string RemoveBetween(string s, char begin, char end)
{
Regex regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
return regex.Replace(s, string.Empty);
}
.NET regexes are even more powerful than the usual and you can surely do what you want. Take a look at this, which looks for balanced parentheses, which is essentially the same problem as yours but with parentheses and not quotes.
http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
I don't speak C, but here's the java implementation:
input.replaceAll("(?<=\\().*?(?=[\"()])(\"([^\"]*)\")?.*(?=\\))", "$2");
This produces the following results:
"foo (bar \"hello world\" foo) bar" --> "foo (hello world) bar"
"foo (bar foo) bar" --> "foo () bar"
It wasn't clear whether you wanted to preserve the quotes - if you did, use $1 instead of $2
Now that you've got the working regex, you should be able to make it work for you in C.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With