This command
$ echo '一二三四五六七八九十' | grep -oE '[一-十]'
outputs:
一
二
三
五
六
七
八
九
十
The regex [一-十]
(one to ten) is expected to match against Chinese numbers.
As the example shows, it matches against every Chinese number from one to ten, except the Chinese character 四
(four)。
Why?
Is this a bug or a joke?
I may think this as joke, because in Chinese '四' (four) sounds alike '事' (thing). In fact, in some dialects of Chinese, they share a same pronunciation. Thus '一二三五六七八九十' (one two three five six seven eight nine ten) implies '沒四' (no four), i.e. '沒事' (no thing)。
BTW, the version of the grep I use:
GNU grep 2.5.4
The Chinese numbers are not in order in Unicode. That 四 is U+56DB, while 一 is U+4E00, and 10 is 5341. So the 4 doesn't fit.
Read the Unicode standard for more information, and see http://www.unicode.org/charts/PDF/U4E00.pdf.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With