I have a regex like this
(?<!(\w/))$#Cannot end with a word and slash
I would like to extract the comment from the end. While the example does not reflect this case, there could be a regex with includes regex on hashes.
\##value must be a hash
What would the regex be to extract the comment ensuring it is safe when used against regex which could contain #'s that are not comments.
Here's a .Net flavored Regex for partly parsing .Net flavor patterns, which should get pretty close:
\A
(?>
\\. # Capture an escaped character
| # OR
\[\^? # a character class
(?:\\.|[^\]])* # which may also contain escaped characters
\]
| # OR
\(\?(?# inline comment!)\#
(?<Comment>[^)]*)
\)
| # OR
\#(?<Comment>.*$) # a common comment!
| # OR
[^\[\\#] # capture any regular character - not # or [
)*
\z
Luckily, in .Net each capturing group remembers all of its captures, and not just the last, so we can find all captures of the Comment group in a single parse. The regex pretty much parses regular expression - but hardly fully, it just parses enough to find comments.
Here's how you use the result:
Match parsed = Regex.Match(pattern, pattern,
RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline);
if (parsed.Success)
{
foreach (Capture capture in parsed.Groups["Comment"].Captures)
{
Console.WriteLine(capture.Value);
}
}
Working example: http://ideone.com/YP3yt
One last word of caution - this regex assumes the whole pattern is in IgnorePatternWhitespace mode. When it isn't set, all # are matched literally. Keep in mind the flag might change multiple times in a single pattern. In (?-x)#(?x)#comment, for example, regardless of IgnorePatternWhitespace, the first # is matched literally, (?x) turns the IgnorePatternWhitespace flag back on, and the second # is ignored.
If you want a robust solution you can use a regex-language parser.
You can probably adapt the .Net source code and extract a parser:
Something like this should work (if you run it separately on each line of the regex). The comment itself (if it exists) will be in the third capturing group.
/^((\\.)|[^\\\#])*\#(.*)/
(\\.) matches an escaped character, [^\#] matches any non-slash non-hash characters, together with the * quantifier they match the entire line before the comment. Then the rest of the regex detects the comment marker and extracts the text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With