I am running into something I could not see in Linux. Can any one tell me why the first regex is not picking up the "ß-carotene"?
$ cat cmpg
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile
ß-carotene
$ cat cmpg|awk '/[^\w\s({)}\r\n\[\]],/'
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile
cat cmpg|awk '/ß/'
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((3R)-1H-pyrazole-1-propanenitrile
ß-Cyclopentyl-4-(7H-pyrrolo[2,3-d]pyrimidin-4-yl)-((R)-1H-pyrazole-1-propanenitrile
ß-carotene
Thanks for the help!
$ cat cmpg|awk '/[^\w\s({)}\r\n\[\]],/'
only matches lines that contain at least one comma.
As for why the negated character class matches the 2
(which puzzled me because \w
contains all ASCII digits, thus [^\w...]
should fail to match 2
): awk
uses POSIX basic regular expressions that don't know the \w
(or \s
) shorthands. You would need to use [:alnum:]
or [:space:]
instead.
All in all, that regex is strange in any regex flavor. What are you trying to achieve with it?
$ cat cmpg|awk '/[^\w\s({)}\r\n\[\]],/'
looks for any string which have 2 characters:
the first character shoud NOT ([^
) be :
\w
: a "word" character (digits, alphanumerical, and underscore)
w
if that awk version doesn't know about \w
special meaning\s
: a whitespace (could be a Lot of things if using unicode, not just space and tab)
s
if that awk version doesn't know about \s
special meaning(
: a (
{
: a {
)
: a )
}
: a }
\r
: a linefeed\n
: a newline\[
: a [
\]
: a ]
the 2nd character HAVE to be :
,
: a ,
(comma).The last line does NOT contain a comma. (the Beta would match, otherwise, as it's not part of the above list)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With