Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to match string in quotes using Regex

Suppose I have the following text in a text file

First Text

"Some Text"

"124arandom txt that should not be parsed!@

"124 Some Text"

"어떤 글"

this text a"s well should not be parsed

I would like to retrieve Some Text, 124 Some Text and 어떤 글 as matched strings. The text is read line by line. Catch is, it has to match foreign languages as well if it is inside quotes.

Update: I found out something weird. I was trying some random stuff and found out that:

string s = "어떤 글"
Regex regex = new Regex("[^\"]*");
MatchCollection matches = regex.Matches(s);

matches have a count = 10 and have generated some empty items inside (The parsed text is in index 2). This might've been why I kept getting empty string when I was just doing Regex.Replace. Why is this happening?

like image 365
l46kok Avatar asked Aug 08 '12 07:08

l46kok


1 Answers

If you read the text line by line, then the regex

"[^"]*"

will find all quoted strings, unless those may contain escaped quotes like "a 2\" by 4\" board".

To match those correctly, you need

"(?:\\.|[^"\\])*"

If you don't want the quotes to become part of the match, use lookaround assertions:

(?<=")[^"]*(?=")
(?<=")(?:\\.|[^"\\])*(?=")

These regexes, as C# regexes, can be created like this:

Regex regex1 = new Regex(@"(?<="")[^\""]*(?="")");
Regex regex2 = new Regex(@"(?<="")(?:\\.|[^""\\])*(?="")");
like image 76
Tim Pietzcker Avatar answered Oct 03 '22 23:10

Tim Pietzcker