Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl 6 error message: Malformed UTF-8 in block <unit>

Tags:

raku

I'm trying to read a downloaded html-file

my $file = "sn.html";
my $in_fh = open $file, :r;
my $text = $in_fh.slurp;

and I get the following error message:

Malformed UTF-8
  in block <unit> at prog.p6 line 10

How to avoid this and get access to the file's contents?

like image 271
Eugene Barsky Avatar asked Mar 16 '18 11:03

Eugene Barsky


2 Answers

For slurp, if you have some idea about encoding, you can also add encoding specifically.

From documentation (https://docs.perl6.org/routine/slurp):

my $text_contents   = slurp "path/to/file", enc => "latin1";

I used it today for a stupid file encoded in ISO-8859-1.

like image 102
Plaute Avatar answered Jan 03 '23 15:01

Plaute


If you do not specify an encoding when opening a file, it will assume utf8. Apparently, the file that you wish to open, contains bytes that cannot be interpreted as UTF-8. Hence the error message.

Depending on what you want to do with the file contents, you could either set the :bin named parameter, to have the file opened in binary mode. Or you could use the special utf8-c8 encoding, which will assume UTF-8 until it encounters bytes it cannot encode: in that case it will generate temporary code points.

See https://docs.raku.org/language/unicode#UTF8-C8 for more information.

like image 22
Elizabeth Mattijsen Avatar answered Jan 03 '23 15:01

Elizabeth Mattijsen