Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does a character class with only a lone caret do?

In trying to answer the question Writing text into new line when a particular character is found, I have employed Regexp::Grammars. It has long interested me and finally I had reason to learn. I noticed that the description section the author has a LaTeX parser (I am an avid LaTeX user, so this interested me) but it has one odd construct seen here:

    <rule: Option>     [^][\$&%#_{}~^\s,]+

    <rule: Literal>    [^][\$&%#_{}~^\s]+

What do the [^] character classes accomplish?

like image 575
Joel Berger Avatar asked Jun 13 '11 15:06

Joel Berger


People also ask

What is the purpose of the caret character in a regular expression?

The caret character ^ anchors a regular expression to the beginning of the input, or (for multi-line regular expressions) to the beginning of a line. If it is preceded by a pattern that must match a non-empty sequence of (non-newline) input characters, then the entire regular expression cannot match anything.

What is character class in regex?

In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.

How do I use character class in regex?

With a “character class”, also called “character set”, you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey.


1 Answers

[^][…] is not two character classes but just one character class containing any other character except ], [, and (see Special Characters Inside a Bracketed Character Class):

However, if the ] is the first (or the second if the first character is a caret) character of a bracketed character class, it does not denote the end of the class (as you cannot have an empty class) and is considered part of the set of characters that can be matched without escaping.

Examples:

"+"   =~ /[+?*]/     #  Match, "+" in a character class is not special.
"\cH" =~ /[\b]/      #  Match, \b inside in a character class
                     #  is equivalent to a backspace.
"]"   =~ /[][]/      #  Match, as the character class contains.
                     #  both [ and ].
"[]"  =~ /[[]]/      #  Match, the pattern contains a character class
                     #  containing just ], and the character class is
                     #  followed by a ].
like image 156
Gumbo Avatar answered Oct 04 '22 10:10

Gumbo