What’s the correct way to write Unicode-aware one-liners in Perl? The obvious way:
$ echo 'フーバー' | perl -lne 'print if /フ/'
フーバー
...kinda appears to work on first sight, but this is just an accident: the Unicode is interpreted as bytes as the next example shows:
$ echo 'フーバー != フウバー' | perl -mString::Diff=diff -lne 'print join(" ", diff($1, $2)) if /(.*)!=(.*)/' => 29
フ?[??]バー[ ] { }フ?{??}バー
Just using the -C
flag to set the STDIN
/STDOUT
etc. to UTF‑8 is not enough by itself:
$ echo 'フーバー' | perl -C -lne 'print if /フ/'
[no output]
...because now the text in -e
is not interpreted as Unicode.
So is this the way to go (assuming a sane LOCALE — that is, one in the form "*.UTF‑8"
) like this:
$ perl -C -Mutf8 [...]
Yes, loading the utf8
pragma is required to interpret the “フ
” UTF‑8 sequence in the source code as a character instead as separate bytes.
The Perl -C
command-line switch and the utf8
pragma are locale-independent, but the shell’s echo
command is not.
I like to use utf8::all
if i need to handle unicode
echo 'フーバー' | perl -Mutf8::all -lne 'print if /フ/'
PS. using -C
you need also give specific flags too, AFAIK
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With