Suppose I have the following text in a text file
First Text
"Some Text"
"124arandom txt that should not be parsed!@
"124 Some Text"
"어떤 글"
this text a"s well should not be parsed
I would like to retrieve Some Text
, 124 Some Text
and 어떤 글
as matched strings. The text is read line by line. Catch is, it has to match foreign languages as well if it is inside quotes.
Update: I found out something weird. I was trying some random stuff and found out that:
string s = "어떤 글"
Regex regex = new Regex("[^\"]*");
MatchCollection matches = regex.Matches(s);
matches have a count = 10 and have generated some empty items inside (The parsed text is in index 2). This might've been why I kept getting empty string when I was just doing Regex.Replace. Why is this happening?
If you read the text line by line, then the regex
"[^"]*"
will find all quoted strings, unless those may contain escaped quotes like "a 2\" by 4\" board"
.
To match those correctly, you need
"(?:\\.|[^"\\])*"
If you don't want the quotes to become part of the match, use lookaround assertions:
(?<=")[^"]*(?=")
(?<=")(?:\\.|[^"\\])*(?=")
These regexes, as C# regexes, can be created like this:
Regex regex1 = new Regex(@"(?<="")[^\""]*(?="")");
Regex regex2 = new Regex(@"(?<="")(?:\\.|[^""\\])*(?="")");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With