Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex pattern to choose data BETWEEN matching quotation marks

Tags:

c#

regex

vb.net

Suppose I had the following string I wanted to run a Regular expression on:

This is a test string with "quotation-marks" within it.
The "problem" I am having, per-se, is "knowing" which "quotation-marks"
go with which words.

Now, suppose I wanted to replace all the - characters between the quotation marks with, say, a space. I was thinking I could do so with a regex looking as follows:

Find What:      (\"[^"]*?)(\-)([^"]*?\")
Replace With:   $1 $3

The problem I'm having is that using this pattern, it does not take into account whether a quotation mark was opening or closing the statement.

So, in the example above, the - character in per-se will be replaced by a space since it is between 2 quotation marks, but between a closing and an opening mark - When I specifically want to look within the text between an opening and a closing mark.

How do you account for this in such a regular expression?

I hope this makes sense.

I'm using VB / C# Regex.


Just to complete the question (and hopefully elaborate a bit more if necessary), the end result I would like to get would be:

This is a test string with "quotation marks" within it.
The "problem" I am having, per-se, is "knowing" which "quotation marks"
go with which words.

Thanks!!

like image 461
John Bustos Avatar asked Jan 13 '14 21:01

John Bustos


2 Answers

You are having the same problem as someone who is trying to match HTML or opening and closing parentheses, regex can only match regular languages and knowing which " is a closing and an opening one is out of its reach for anything but the trivial cases.

EDIT: As shown in Vasili Syrakis's answer, sometimes it can be done but regex is a fragile solution for this type of problem.

With that said, you can convert your problem in the trivial case. Since you are using .NET, you can simply match every quoted string and use the overload that takes a match evaluator.

Regex.Replace(text, "\".*?\"", m => m.Value.Replace("-", " "))

Test:

var text = @"This is a test string with ""quotation-marks"" within it.
The ""problem"" I am having, per-se, is ""knowing"" which ""quotation-marks""
go with which words.";

Console.Write(Regex.Replace(text, "\".*?\"", m => m.Value.Replace("-", " ")));
//This is a test string with "quotation marks" within it.
//The "problem" I am having, per-se, is "knowing" which "quotation marks"
//go with which words. 
like image 80
dee-see Avatar answered Nov 14 '22 23:11

dee-see


Busted my brain to work this one out, turns out that specifying non-word boundaries \B does the trick:

Regex

\B("[^"]*)-([^"]*")\B

Replacement

$1 $2


Demo

http://regex101.com/r/dS0bH8

like image 40
Vasili Syrakis Avatar answered Nov 14 '22 23:11

Vasili Syrakis