Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk and equivalence classes

Does gnu awk support POSIX equivalence classes?

Is it possible to match [[=a=]] using awk as it is done in grep?

$ echo ábÅ | grep [[=a=]]
ábÅ

$ echo ábÅ | grep -o [[=a=]]
á
Å
like image 528
Eugene Barsky Avatar asked Dec 24 '22 19:12

Eugene Barsky


2 Answers

Per the GAWK User's Guide, "Caution: The library functions that gawk uses for regular expression matching currently only recognize POSIX character classes; they do not recognize collating symbols or equivalence classes.".

Accordingly, you're going to have to write-out the allowed equivalents in the regex /[aáÅ]/ or whatever you're looking for.

There are locale-aware character ranges but that doesn't seem to be what you're asking about.

like image 150
Raymond Hettinger Avatar answered Jan 13 '23 10:01

Raymond Hettinger


See here, towards the end:

Locale-specific names for a list of characters that are equal. The name is enclosed between ‘[=’ and ‘=]’. For example, the name ‘e’ might be used to represent all of “e,” “ê,” “è,” and “é.” In this case, ‘[[=e=]]’ is a regexp that matches any of ‘e’, ‘ê’, ‘é’, or ‘è’.

These features are very valuable in non-English-speaking locales.

CAUTION: The library functions that gawk uses for regular expression matching currently recognize only POSIX character classes; they do not recognize collating symbols or equivalence classes.

like image 22
James Brown Avatar answered Jan 13 '23 09:01

James Brown