How to extract regex comment

Question

I have a regex like this

(?<!(\w/))$#Cannot end with a word and slash

I would like to extract the comment from the end. While the example does not reflect this case, there could be a regex with includes regex on hashes.

\##value must be a hash

What would the regex be to extract the comment ensuring it is safe when used against regex which could contain #'s that are not comments.

Kobi · Accepted Answer

Here's a .Net flavored Regex for partly parsing .Net flavor patterns, which should get pretty close:

\A
(?>
    \.         # Capture an escaped character
    |           # OR
    $$\^?       # a character class
        (?:\.|[^$$])*    # which may also contain escaped characters
    \]
    |           # OR
    $\?(?# inline comment!)\#      
        (?<Comment>[^)]*)
    $
    |           # OR
    \#(?<Comment>.*$)   # a common comment!
    |           # OR
    [^\[\#]    # capture any regular character - not # or [
)*
\z

Luckily, in .Net each capturing group remembers all of its captures, and not just the last, so we can find all captures of the Comment group in a single parse. The regex pretty much parses regular expression - but hardly fully, it just parses enough to find comments.
Here's how you use the result:

Match parsed = Regex.Match(pattern, pattern,
                           RegexOptions.IgnorePatternWhitespace | 
                           RegexOptions.Multiline);
if (parsed.Success)
{
    foreach (Capture capture in parsed.Groups["Comment"].Captures)
    {
        Console.WriteLine(capture.Value);
    }
}

Working example: http://ideone.com/YP3yt

One last word of caution - this regex assumes the whole pattern is in IgnorePatternWhitespace mode. When it isn't set, all # are matched literally. Keep in mind the flag might change multiple times in a single pattern. In (?-x)#(?x)#comment, for example, regardless of IgnorePatternWhitespace, the first # is matched literally, (?x) turns the IgnorePatternWhitespace flag back on, and the second # is ignored.

If you want a robust solution you can use a regex-language parser.
You can probably adapt the .Net source code and extract a parser:

Reference Source - RegexParser.cs
GitHub - RegexParser.cs

Anon. · Answer

Something like this should work (if you run it separately on each line of the regex). The comment itself (if it exists) will be in the third capturing group.

/^((\.)|[^\\#])*\#(.*)/

(\.) matches an escaped character, [^\#] matches any non-slash non-hash characters, together with the * quantifier they match the entire line before the comment. Then the rest of the regex detects the comment marker and extracts the text.

How to extract regex comment

Tags:

regex

Valamas

2 Answers

Kobi

Anon.

Recent Activity

Donate For Us

How to extract regex comment

Tags:

regex

Valamas

2 Answers

Kobi

Anon.

Related questions

Recent Activity

Donate For Us