Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

writing Unicode-aware one-liners in Perl

What’s the correct way to write Unicode-aware one-liners in Perl? The obvious way:

$ echo 'フーバー' | perl  -lne 'print if /フ/'  
フーバー

...kinda appears to work on first sight, but this is just an accident: the Unicode is interpreted as bytes as the next example shows:

$ echo 'フーバー != フウバー' | perl  -mString::Diff=diff -lne 'print join(" ", diff($1, $2)) if /(.*)!=(.*)/'                                                                                 => 29
フ?[??]バー[ ] { }フ?{??}バー

Just using the -C flag to set the STDIN/STDOUT etc. to UTF‑8 is not enough by itself:

$ echo 'フーバー' | perl -C -lne 'print if /フ/' 
[no output]

...because now the text in -e is not interpreted as Unicode.

So is this the way to go (assuming a sane LOCALE — that is, one in the form "*.UTF‑8") like this:

$ perl -C -Mutf8 [...]
like image 664
as. Avatar asked Feb 29 '12 10:02

as.


2 Answers

Yes, loading the utf8 pragma is required to interpret the “” UTF‑8 sequence in the source code as a character instead as separate bytes.

The Perl -C command-line switch and the utf8 pragma are locale-independent, but the shell’s echo command is not.

like image 124
daxim Avatar answered Nov 16 '22 16:11

daxim


I like to use utf8::all if i need to handle unicode

echo 'フーバー' | perl -Mutf8::all -lne 'print if /フ/'

PS. using -C you need also give specific flags too, AFAIK

like image 33
w.k Avatar answered Nov 16 '22 18:11

w.k