Lots of ready-to-use character classes are available in Perl regular expressions, such as \d
or \S
, or new-fangled Unicode grokkers such as \p{P}
, which matches punctuation characters.
Now let's say I'd like to match all punctuation characters \p{P}
(quite a number of them, and not something you want to type in by hand) - all but one, all but the good old komma (or comma, ,
).
Is there a way to specify this requirement short of expanding the handy character class and taking away the komma by hand?
In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.
\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.
A character class is a special notation that matches any symbol from a certain set. For the start, let's explore the “digit” class. It's written as \d and corresponds to “any single digit”. For instance, let's find the first digit in the phone number: let str = "+7(903)-123-45-67"; let regexp = /\d/; alert( str.
The dot( . ) matches any character except the newline character. Use the s flag to make the dot ( . ) character class matches any character including the newline.
$ unichars -au '\p{P}' | wc -l
598
Double negation:
/[^\P{P},]/
$ unichars -au '[^\P{P},]' | wc -l
597
"And" through lookahead/lookbehind:
/\p{P}(?<!,)/
$ unichars -au '\p{P}(?<!,)' | wc -l
597
unichars
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With