Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I suppress UTF-8 warnings in Perl?

Tags:

utf-8

perl

Due to various reasons I'm getting the error messages Malformed UTF-8 character and Wide character in print from a legacy script.

I would like to suppress/disable those two warnings so that they are not written to STDERR.

How do I do that?

like image 419
knorv Avatar asked Jun 02 '11 22:06

knorv


3 Answers

Here's two examples to help you understand the errors:

milu@ubuntu: ~/Milu/Dev/Perl > cat malformed-utf8-char.pl 
use utf8; # script source must be in UTF-8
use strict;
use warnings;
print "K�se\n";
milu@ubuntu: ~/Milu/Dev/Perl > perl malformed-utf8-char.pl
Malformed UTF-8 character (unexpected non-continuation byte 0x73,
immediately after start byte 0xe4) at malformed-utf8-char.pl line 4.
Kse

The source is in Latin-1, my terminal is in UTF-8. The string is actually "Käse". The utf8 pragma must either be removed, or the source be saved in UTF-8.

milu@ubuntu: ~/Milu/Dev/Perl > cat wide-char-in-print.pl 
use utf8;
use strict;
use warnings;
# binmode STDOUT, ':utf8';
print "Группа сайтов РИА Новости\n";
milu@ubuntu: ~/Milu/Dev/Perl > perl wide-char-in-print.pl
Wide character in print at wide-char-in-print.pl line 5.
Группа сайтов РИА Новости

The source contains Cyrillic characters, hence the utf8 pragma is in order. To print those characters to the terminal, however, STDOUT must also be set to UTF-8, which you can achieve by calling binmode. If you don't do this, a warning is triggered as a wide (Unicode beyond 0x255) character doesn't fit through a narrow (byte) output channel. It'll still look correct, because Perl will just output the bytes as they are, which then happens to look correct.

like image 168
Lumi Avatar answered Nov 01 '22 16:11

Lumi


Presumably you are working in utf8. You have to turn on utf8 handling for each filehandle.

binmode STDERR, ":encoding(utf8)";

You can do this for all the standard handles with use open ":encoding(utf8)". See open for more info.

Finally, you can utf8-ify everything, your code, your filehandles and your arguments, with utf8::all.

Note that :utf8 turns on utf8 handling but :encoding(utf8) checks the data is valid utf8, so it is safer. See perldoc -f binmode for details.

like image 42
Schwern Avatar answered Nov 01 '22 15:11

Schwern


no warnings 'utf8';

But it's best to figure out why you're getting the warning and fix the underlying problem. Those two warnings indicate something is going wrong in your script. Suppressing the warnings won't fix the error.

like image 8
cjm Avatar answered Nov 01 '22 17:11

cjm