Does gnu awk support POSIX equivalence classes?
Is it possible to match [[=a=]] using awk as it is done in grep?
$ echo ábÅ | grep [[=a=]]
ábÅ
$ echo ábÅ | grep -o [[=a=]]
á
Å
Per the GAWK User's Guide, "Caution: The library functions that gawk uses for regular expression matching currently only recognize POSIX character classes; they do not recognize collating symbols or equivalence classes.".
Accordingly, you're going to have to write-out the allowed equivalents in the regex /[aáÅ]/
or whatever you're looking for.
There are locale-aware character ranges but that doesn't seem to be what you're asking about.
See here, towards the end:
Locale-specific names for a list of characters that are equal. The name is enclosed between ‘[=’ and ‘=]’. For example, the name ‘e’ might be used to represent all of “e,” “ê,” “è,” and “é.” In this case, ‘[[=e=]]’ is a regexp that matches any of ‘e’, ‘ê’, ‘é’, or ‘è’.
These features are very valuable in non-English-speaking locales.
CAUTION: The library functions that gawk uses for regular expression matching currently recognize only POSIX character classes; they do not recognize collating symbols or equivalence classes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With