I am running into something I could not see in Linux. Can any one tell me why the first regex is not picking up the "ß-carotene"?
$ cat cmpg
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile
ß-carotene  
$ cat cmpg|awk  '/[^\w\s({)}\r\n\[\]],/'
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile
cat cmpg|awk  '/ß/'
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile
ß-carotene
Thanks for the help!
$ cat cmpg|awk  '/[^\w\s({)}\r\n\[\]],/'
only matches lines that contain at least one comma.
As for why the negated character class matches the 2 (which puzzled me because \w contains all ASCII digits, thus [^\w...] should fail to match 2): awk uses POSIX basic regular expressions that don't know the \w (or  \s)  shorthands. You would need to use [:alnum:] or [:space:] instead.
All in all, that regex is strange in any regex flavor. What are you trying to achieve with it?
$ cat cmpg|awk  '/[^\w\s({)}\r\n\[\]],/'
looks for any string which have 2 characters:
the first character shoud NOT ([^) be : 
\w : a "word" character (digits, alphanumerical, and underscore)
w if that awk version doesn't know about \w special meaning\s : a whitespace (could be a Lot of things if using unicode, not just space and tab)
s if that awk version doesn't know about \s special meaning(  : a (
{  : a {
)  : a )
}  : a }
\r : a linefeed\n : a newline\[ : a [
\] : a ]
the 2nd character HAVE to be :
, : a , (comma).The last line does NOT contain a comma. (the Beta would match, otherwise, as it's not part of the above list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With