writing Unicode-aware one-liners in Perl

Question

What’s the correct way to write Unicode-aware one-liners in Perl? The obvious way:

$ echo 'フーバー' | perl  -lne 'print if /フ/'  
フーバー

...kinda appears to work on first sight, but this is just an accident: the Unicode is interpreted as bytes as the next example shows:

$ echo 'フーバー != フウバー' | perl  -mString::Diff=diff -lne 'print join(" ", diff($1, $2)) if /(.*)!=(.*)/'                                                                                 => 29
フ?[??]バー[ ] { }フ?{??}バー

Just using the -C flag to set the STDIN/STDOUT etc. to UTF‑8 is not enough by itself:

$ echo 'フーバー' | perl -C -lne 'print if /フ/' 
[no output]

...because now the text in -e is not interpreted as Unicode.

So is this the way to go (assuming a sane LOCALE — that is, one in the form "*.UTF‑8") like this:

$ perl -C -Mutf8 [...]

daxim · Accepted Answer

Yes, loading the utf8 pragma is required to interpret the “フ” UTF‑8 sequence in the source code as a character instead as separate bytes.

The Perl -C command-line switch and the utf8 pragma are locale-independent, but the shell’s echo command is not.

w.k · Answer

I like to use utf8::all if i need to handle unicode

echo 'フーバー' | perl -Mutf8::all -lne 'print if /フ/'

PS. using -C you need also give specific flags too, AFAIK

writing Unicode-aware one-liners in Perl

Tags:

shell

unicode

utf-8

perl

as.

2 Answers

daxim

w.k

Recent Activity

Donate For Us

writing Unicode-aware one-liners in Perl

Tags:

shell

unicode

utf-8

perl

as.

2 Answers

daxim

w.k

Related questions

Recent Activity

Donate For Us