awk and equivalence classes

Question

Does gnu awk support POSIX equivalence classes?

Is it possible to match [[=a=]] using awk as it is done in grep?

$ echo ábÅ | grep [[=a=]]
ábÅ

$ echo ábÅ | grep -o [[=a=]]
á
Å

Raymond Hettinger · Accepted Answer

Per the GAWK User's Guide, "Caution: The library functions that gawk uses for regular expression matching currently only recognize POSIX character classes; they do not recognize collating symbols or equivalence classes.".

Accordingly, you're going to have to write-out the allowed equivalents in the regex /[aáÅ]/ or whatever you're looking for.

There are locale-aware character ranges but that doesn't seem to be what you're asking about.

James Brown · Answer

See here, towards the end:

Locale-specific names for a list of characters that are equal. The name is enclosed between ‘[=’ and ‘=]’. For example, the name ‘e’ might be used to represent all of “e,” “ê,” “è,” and “é.” In this case, ‘[[=e=]]’ is a regexp that matches any of ‘e’, ‘ê’, ‘é’, or ‘è’.

These features are very valuable in non-English-speaking locales.

CAUTION: The library functions that gawk uses for regular expression matching currently recognize only POSIX character classes; they do not recognize collating symbols or equivalence classes.

awk and equivalence classes

Tags:

regex

grep

awk

equivalence-classes

Eugene Barsky

2 Answers

Raymond Hettinger

James Brown

Recent Activity

Donate For Us

awk and equivalence classes

Tags:

regex

grep

awk

equivalence-classes

Eugene Barsky

2 Answers

Raymond Hettinger

James Brown

Related questions

Recent Activity

Donate For Us