Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In which cases does encode/decode utf8 croak?

This script gives me two times the same output. Are there encoding which would not survive the utf8 encode and decode between the two say?

#!/usr/bin/env perl
use warnings;
use 5.16.1;
use Encode qw/encode decode/;

my $my_encoding = 'ISO-8859-7';
binmode STDOUT, ":encoding($my_encoding)";

my $var = "\N{GREEK SMALL LETTER TAU}";
$var .= "\N{GREEK SMALL LETTER OMEGA WITH TONOS}";
$var .= "\N{GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA}";

$var = encode( 'utf8', $var );
$var = decode( $my_encoding, $var );

say $var;

my $test = encode( 'utf8', $var, Encode::FB_CROAK );
$var = decode( 'utf8', $test, Encode::FB_CROAK  );

say $var;
like image 570
sid_com Avatar asked May 19 '26 22:05

sid_com


1 Answers

It croaks if you try to encode something that falls outside of the target encoding's character set.

utf8 is a Perl-specific encoding used by Perl to store 72-bit characters. It is similar to UTF-8, but it is different. It supports every character Perl supports, so it will never croak.

On the other hand, if you were to use UTF-8, it would will croak if you try to encode something that isn't a Unicode character (e.g. chr(0x200000)).

See also: :encoding(UTF-8) vs :encoding(utf8) vs :utf8

like image 83
ikegami Avatar answered May 21 '26 13:05

ikegami