Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Perl's length() function counts unicode characters?

Tags:

perl

Why length() says this is 4 logical characters (I would expect it to say 1):

$ perl -lwe 'print length("🐪")'
4

I guess something is wrong with my expectation. :-) What is it?

like image 956
jreisinger Avatar asked Dec 18 '22 20:12

jreisinger


1 Answers

Unless you tell Perl that the source code of the script is in utf8 Perl assumes ASCII. This means that by default the Perl interpreter sees 🐪 as 4 separate characters. If you change your one liner to perl -Mutf8 -lwe 'print length("🐪")' You see length providing your expected output.

The utf8 pragma tells Perl that the source unit is in utf8 and not ASCII. See perldoc utf8 for more info.

like image 80
JGNI Avatar answered Jan 06 '23 17:01

JGNI