Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In what encoding does readpipe return the result of an executed command?

Here's a simple perl script that is supposed to write a utf-8 encoded file:

use warnings;
use strict;

open (my $out, '>:encoding(utf-8)', 'tree.out') or die;

print $out readpipe ('tree ~');

close $out;

I have expected readpipe to return a utf-8 encoded string since LANG is set toen_US.UTF-8. However, looking at tree.out (while making sure the editor recognizes it a as utf-8 encoded) shows me all garbled text.

If I change the >:encoding(utf-8) in the open statement to >:encoding(latin-1), the script creates a utf-8 file with the expected text.

This is all a bit strange to me. What is the explanation for this behavior?

like image 725
René Nyffenegger Avatar asked May 02 '16 14:05

René Nyffenegger


1 Answers

readpipe is returning to perl a string of undecoded bytes. We know that that string is UTF-8 encoded, but you've not told Perl.

The IO layer on your output handle is taking that string, assuming it is Unicode code-points and re-encoding them as UTF-8 bytes.

The reason that the latin-1 IO layer appears to be functioning correctly is that it is writing out each undecoded byte unmolested because the 1st 256 unicode code-points correspond nicely with latin-1.

The proper thing to do would be to decode the byte-string returned by readpipe into a code-point-string, before feeding it to an IO-layer. The statement use open ':utf8', as mentioned by Borodin, should be a viable solution as readpipe is specifically mentioned in the open manual page.

like image 142
tjd Avatar answered Oct 23 '22 23:10

tjd