How will I be able to look for kewords that are not inside a string.
For example if I have the text:
Hello this text is an example.
bla bla bla "this text is inside a string"
"random string" more text bla bla bla "foo"
I will like to be able to match all the words text
that are not inside " "
. In other I will like to match:
note I do not want to match the text that is highlighted on red because it is inside a string
Possible solution:
I been working on it and this is what I have so far:
(?s)((?<q>")|text)(?(q).*?"|)
note that regex uses the if statement as: (?(predicate) true alternative|false alternative)
so the regex will read:
find " or text. If you find " then continue selecting until you find " again (.*?") if you find text then do nothing...
when I run that regex I match the whole string though. I am asking this question for purposes of learning. I know I can remove all strings then look for what I need.
In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.
Firstly, double quote character is nothing special in regex - it's just another character, so it doesn't need escaping from the perspective of regex. However, because Java uses double quotes to delimit String constants, if you want to create a string in Java with a double quote in it, you must escape them.
The period (.) represents the wildcard character. Any character (except for the newline character) will be matched by a period in a regular expression; when you literally want a period in a regular expression you need to precede it with a backslash.
Try putting a backslash ( \ ) followed by " .
Here is one answer:
(?<=^([^"]|"[^"]*")*)text
This means:
(?<= # preceded by...
^ # the start of the string, then
([^"] # either not a quote character
|"[^"]*" # or a full string
)* # as many times as you want
)
text # then the text
You can easily extend this to handle strings containing escapes as well.
In C# code:
Regex.Match("bla bla bla \"this text is inside a string\"",
"(?<=^([^\"]|\"[^\"]*\")*)text", RegexOptions.ExplicitCapture);
Added from comment discussion - extended version (match on a per-line basis and handle escapes). Use RegexOptions.Multiline
for this:
(?<=^([^"\r\n]|"([^"\\\r\n]|\\.)*")*)text
In a C# string this looks like:
"(?<=^([^\"\r\n]|\"([^\"\\\\\r\n]|\\\\.)*\")*)text"
Since you now want to use **
instead of "
here is a version for that:
(?<=^([^*\r\n]|\*(?!\*)|\*\*([^*\\\r\n]|\\.|\*(?!\*))*\*\*)*)text
Explanation:
(?<= # preceded by
^ # start of line
( # either
[^*\r\n]| # not a star or line break
\*(?!\*)| # or a single star (star not followed by another star)
\*\* # or 2 stars, followed by...
([^*\\\r\n] # either: not a star or a backslash or a linebreak
|\\. # or an escaped char
|\*(?!\*) # or a single star
)* # as many times as you want
\*\* # ended with 2 stars
)* # as many times as you want
)
text # then the text
Since this version doesn't contain "
characters it's cleaner to use a literal string:
@"(?<=^([^*\r\n]|\*(?!\*)|\*\*([^*\\\r\n]|\\.|\*(?!\*))*\*\*)*)text"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With