Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use double brackets in a regular expression?

What do double square brackets mean in a regex? I am confused about the following examples:

/[[^abc]]/

/[^abc]/

I was testing using Rubular, but I didn't see any difference between the one with double brackets and single brackets.

like image 773
runcode Avatar asked Sep 05 '12 06:09

runcode


People also ask

How do you use brackets in regular expressions?

For example, "/x{2,3}/" matches "xx" and "xxx". The square brackets match any one of characters inside the brackets. A range of characters in the alphabet can be matched using the hyphen. For example, "/[xyz]/ "will match any of "x", "y", or "z".

What do the [] brackets mean in regular expressions?

Square brackets ( “[ ]” ): Any expression within square brackets [ ] is a character set; if any one of the characters matches the search string, the regex will pass the test return true.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does \\ mean in regular expression?

You also need to use regex \\ to match "\" (back-slash). Regex recognizes common escape sequences such as \n for newline, \t for tab, \r for carriage-return, \nnn for a up to 3-digit octal number, \xhh for a two-digit hex code, \uhhhh for a 4-digit Unicode, \uhhhhhhhh for a 8-digit Unicode.


2 Answers

Posix character classes use a [:alpha:] notation, which are used inside a regular expression like:

/[[:alpha:][:digit:]]/

You'll need to scroll down a ways to get to the Posix information in the link above. From the docs:

POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.

/[[:alnum:]]/ - Alphabetic and numeric character
/[[:alpha:]]/ - Alphabetic character
/[[:blank:]]/ - Space or tab
/[[:cntrl:]]/ - Control character
/[[:digit:]]/ - Digit
/[[:graph:]]/ - Non-blank character (excludes spaces, control characters, and similar)
/[[:lower:]]/ - Lowercase alphabetical character
/[[:print:]]/ - Like [:graph:], but includes the space character
/[[:punct:]]/ - Punctuation character
/[[:space:]]/ - Whitespace character ([:blank:], newline,
carriage return, etc.)
/[[:upper:]]/ - Uppercase alphabetical
/[[:xdigit:]]/ - Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)

Ruby also supports the following non-POSIX character classes:

/[[:word:]]/ - A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation
/[[:ascii:]]/ - A character in the ASCII character set
# U+06F2 is "EXTENDED ARABIC-INDIC DIGIT TWO"

/[[:digit:]]/.match("\u06F2")    #=> #<MatchData "\u{06F2}">
/[[:upper:]][[:lower:]]/.match("Hello") #=> #<MatchData "He">
/[[:xdigit:]][[:xdigit:]]/.match("A6")  #=> #<MatchData "A6">
like image 109
the Tin Man Avatar answered Sep 17 '22 11:09

the Tin Man


'[[' doesn't have any special meaning. [xyz] is a character class and will match a single x, y or z. The carat ^ takes all characters not in the brackets.

Removing ^ for simplicity, you can see that the first open bracket is being matched with the first close bracket and the second closed bracket is being used as part of the character class. The final close bracket is treated as another character to be matched.

irb(main):032:0> /[[abc]]/ =~ "[a]"
=> 1
irb(main):033:0> /[[abc]]/ =~ "a]"
=> 0

This appears to have the same result as your original in some cases

irb(main):034:0> /[abc]/ =~ "a]"
=> 0
irb(main):034:0> /[abc]/ =~ "a"
=> 0

But this is only because your regular expression is not looking for an exact match.

irb(main):036:0> /^[abc]$/ =~ "a]"
=> nil
like image 29
dfb Avatar answered Sep 17 '22 11:09

dfb