Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Am I using utf8::is_utf8 correctly?

Tags:

utf-8

decode

perl

Does this work correctly? Some error messages are already decode and some need do be decoded do get a correct output.

#!/usr/bin/env perl
use warnings;
use strict;
use utf8;
use open qw(:utf8 :std);
use Encode qw(decode_utf8);

# ...

if ( not eval{
    # some error-messages (utf8) are decoded some are not
    1 }
) {
    if ( utf8::is_utf8 $@ ) {
        print $@;
    }
    else {
        print decode_utf8( $@ );
    }
}
like image 704
sid_com Avatar asked Dec 20 '25 01:12

sid_com


1 Answers

Am I using utf8::is_utf8 correctly?

No. Any use of utf8::is_utf8 is incorrect as you should never use it! Using utf8::is_utf8 to guess at semantics of a string is what's known as an instance of The Unicode Bug. Except for inspecting the internal state of variables when debugging Perl or XS module, utf8::is_utf8 has no use.

It does not indicate whether the value in a variable is encoded using UTF-8 or not. In fact, that's impossible to know reliably. For example, does "\xC3\xA9" produce a string that's encoded using UTF-8 or not? Well, there's no way to know! It depends on whether I meant "é", "é" or something entirely different.

If the variable may contain both encoded and decoded strings, it's up to you to track that using a second variable. I strongly advise against this, though. Just decode everything as it comes in from the outside.

If you really can't, your best bet it to try to decode $@ and ignore errors. It's very unlikely that something readable that isn't UTF-8 would be valid UTF-8.

# $@ is sometimes encoded. If it's not,
# the following will leave it unchanged.
utf8::decode($@);

print $@;
like image 60
ikegami Avatar answered Dec 21 '25 17:12

ikegami



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!