What’s the correct way to write Unicode-aware one-liners in Perl? The obvious way:
$ echo 'フーバー' | perl -lne 'print if /フ/'
フーバー
...kinda appears to work on first sight, but this is just an accident: the Unicode is interpreted as bytes as the next example shows:
$ echo 'フーバー != フウバー' | perl -mString::Diff=diff -lne 'print join(" ", diff($1, $2)) if /(.*)!=(.*)/' => 29
フ?[??]バー[ ] { }フ?{??}バー
Just using the -C flag to set the STDIN/STDOUT etc. to UTF‑8 is not enough by itself:
$ echo 'フーバー' | perl -C -lne 'print if /フ/'
[no output]
...because now the text in -e is not interpreted as Unicode.
So is this the way to go (assuming a sane LOCALE — that is, one in the form "*.UTF‑8") like this:
$ perl -C -Mutf8 [...]
Yes, loading the utf8 pragma is required to interpret the “フ” UTF‑8 sequence in the source code as a character instead as separate bytes.
The Perl -C command-line switch and the utf8 pragma are locale-independent, but the shell’s echo command is not.
I like to use utf8::all if i need to handle unicode
echo 'フーバー' | perl -Mutf8::all -lne 'print if /フ/'
PS. using -C you need also give specific flags too, AFAIK
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With