If I run the following Perl program: <pre class="prettyprint"><code>perl -e 'use utf8; print "鸡\n";' </code></pre> I get this warning: <pre class="prettyprint"><code>Wide character in print at -e line 1. </code></pre> If I run this Perl program: <pre class="prettyprint"><code>perl -e 'print "鸡\n";' </code></pre> I do not get a warning. I thought <code>use utf8</code> was required to use UTF-8 characters in a Perl script. Why does this not work and how can I fix it? I'm using Perl 5.16.2. I have the same issue if this is in a file instead of being a one liner on the command line.

Without <code>use utf8</code> Perl interprets your string as a sequence of single byte characters. There are four bytes in your string as you can see from this: <pre class="prettyprint"><code>$ perl -E 'say join ":", map { ord } split //, "鸡\n";' 233:184:161:10 </code></pre> The first three bytes make up your character, the last one is the line-feed. The call to <code>print</code> sends these four characters to STDOUT. Your console then works out how to display these characters. If your console is set to use UTF8, then it will interpret those three bytes as your single character and that is what is displayed. If we add in the <code>utf8</code> module, things are different. In this case, Perl interprets your string as just two characters. <pre class="prettyprint"><code>$ perl -Mutf8 -E 'say join ":", map { ord } split //, "鸡\n";' 40481:10 </code></pre> By default, Perl's IO layer assumes that it is working with single-byte characters. So when you try to print a multi-byte character, Perl thinks that something is wrong and gives you a warning. As ever, you can get more explanation for this error by including <code>use diagnostics</code>. It will say this: <blockquote> (S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode. </blockquote> As others have pointed out you need to tell Perl to accept multi-byte output. There are many ways to do this (see the Perl Unicode Tutorial for some examples). One of the simplest ways is to use the <code>-CS</code> command line flag - which tells the three standard filehandles (STDIN, STDOUT and STDERR) to deal with UTF8. <pre class="prettyprint"><code>$ perl -Mutf8 -e 'print "鸡\n";' Wide character in print at -e line 1. 鸡 </code></pre> vs <pre class="prettyprint"><code>$ perl -Mutf8 -CS -e 'print "鸡\n";' 鸡 </code></pre> Unicode is a big and complex area. As you've seen, many simple programs appear to do the right thing, but for the wrong reasons. When you start to fix part of the program, things will often get worse until you've fixed all of the program.

Use of 'use utf8;' gives me 'Wide character in print'

Tags:

unicode

utf-8

perl

If I run the following Perl program:

perl -e 'use utf8; print "鸡\n";'

I get this warning:

Wide character in print at -e line 1.

If I run this Perl program:

perl -e 'print "鸡\n";'

I do not get a warning.

I thought use utf8 was required to use UTF-8 characters in a Perl script. Why does this not work and how can I fix it? I'm using Perl 5.16.2. I have the same issue if this is in a file instead of being a one liner on the command line.

403

asked Mar 04 '13 20:03

Eric Johnson

1 Answers

Without use utf8 Perl interprets your string as a sequence of single byte characters. There are four bytes in your string as you can see from this:

$ perl -E 'say join ":", map { ord } split //, "鸡\n";' 233:184:161:10

The first three bytes make up your character, the last one is the line-feed.

The call to print sends these four characters to STDOUT. Your console then works out how to display these characters. If your console is set to use UTF8, then it will interpret those three bytes as your single character and that is what is displayed.

If we add in the utf8 module, things are different. In this case, Perl interprets your string as just two characters.

$ perl -Mutf8 -E 'say join ":", map { ord } split //, "鸡\n";' 40481:10

By default, Perl's IO layer assumes that it is working with single-byte characters. So when you try to print a multi-byte character, Perl thinks that something is wrong and gives you a warning. As ever, you can get more explanation for this error by including use diagnostics. It will say this:

(S utf8) Perl met a wide character (>255) when it wasn't expecting one. This warning is by default on for I/O (like print). The easiest way to quiet this warning is simply to add the :utf8 layer to the output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the warning is to add no warnings 'utf8'; but that is often closer to cheating. In general, you are supposed to explicitly mark the filehandle with an encoding, see open and perlfunc/binmode.

As others have pointed out you need to tell Perl to accept multi-byte output. There are many ways to do this (see the Perl Unicode Tutorial for some examples). One of the simplest ways is to use the -CS command line flag - which tells the three standard filehandles (STDIN, STDOUT and STDERR) to deal with UTF8.

$ perl -Mutf8 -e 'print "鸡\n";' Wide character in print at -e line 1. 鸡

$ perl -Mutf8 -CS -e 'print "鸡\n";' 鸡

Unicode is a big and complex area. As you've seen, many simple programs appear to do the right thing, but for the wrong reasons. When you start to fix part of the program, things will often get worse until you've fixed all of the program.

165

answered Sep 21 '22 23:09

Dave Cross

Related questions
                            
                                Programmatically read from STDIN or input file in Perl
                            
                                Multiline search replace with Perl
                            
                                How can I de-install a Perl module installed via `cpan`?
                            
                                How do I sleep for a millisecond in Perl?
                            
                                Is there a Perl shortcut to count the number of matches in a string?
                            
                                Command line: search and replace in all filenames matched by grep
                            
                                What is the best way to delete a value from an array in Perl?
                            
                                String compare in Perl with "eq" vs "==" [duplicate]
                            
                                What does the <<'m'=~m>> syntax mean in perl?
                            
                                How can I use CPAN as a non-root user?
                            
                                In Perl, how do I create a hash whose keys come from a given array?
                            
                                The recognizing power of "modern" regexes
                            
                                How can I download all emails with attachments from Gmail?
                            
                                Why don't I get any syntax errors when I execute my Python script with Perl?
                            
                                Perl: function to trim string leading and trailing whitespace
                            
                                no pg_hba.conf entry for host
                            
                                Perl build, unit testing, code coverage: A complete working example
                            
                                In Perl, how can I concisely check if a $variable is defined and contains a non zero length string?
                            
                                How to match a substring in a string, ignoring case
                            
                                Regular expression to validate username

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With