I tried to understand how 'collating symbols' match works but I did not come out this. I understood that it means matching an exact sequence instead of just the character(s), that is:
echo "ciiiao" | grep '[oa]' --> output 'ciiiao'
echo "ciiiao" | grep '[[.oa.]]' --> no output
echo "ciiiao" | grep '[[.ia.]]' --> output 'ciiiao'
However, the third command does not work. Am I wrong or I misinterpret something?
I have read this regexp tutorial.
It defines a collating element to be “a sequence of one or more bytes defined in the current collating sequence as a unit of collation.” This generalizes the notion of a character in two ways. First, a single character can map into two or more collating elements.
A bracket expression is either a matching list expression or a non-matching list expression. It consists of one or more expressions: ordinary characters, collating elements, collating symbols, equivalence classes, character classes, or range expressions.
An extended regular expression specifies a set of strings to be matched. The expression contains both text characters and operator characters. Text characters match the corresponding characters in the strings being compared. Operator characters specify repetitions, choices, and other features.
Use \s to match any single whitespace character.
Collating symbols are typically used when a digraph is treated like a single character in a language. They are an element of the POSIX regular expression specification, and are not widely supported.
For example, the Welsh alphabet has a number of digraphs that are treated as a single letter (marked with a * below)
a b c ch d dd e f ff g ng h i j l ll m n o p ph r rh s t th u w y
* * * * * * *
Assuming the locale file defines it (a collating symbol will only work if it is defined in the current locale), the collating symbol [[.ng.]]
is treated like a single character. Likewise, a single character expression like .
or [^a]
will also match "ff" or "th." This also affects sorting, so that [p-t]
will include the digraphs "ph" and "rh" in addition to the expected single letters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With