Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl convert a filehandle in-place/streaming from cp1252 to utf-8?

Tags:

utf-8

perl

I have a filehandle open on a file with cp1252 characters in it. I want to give that open filehandle to a library that expects raw utf8 bytes, it's going to send those over the network.

The naive way to do it would be to write the file out to a second file with the right encoding, and give the second filehandle to the library:

use Fcntl qw/SEEK_SET/;

open my $fh_1252, "<:encoding(cp1252)", "1252.txt" || die $!;

open my $fh_utf8, "+>:encoding(utf8)", "utf8.txt"  || die $!;

while (<$fh_1252>){ print $fh_utf8 $_ };

seek($fh_utf8, 0, SEEK_SET);

# now give $fh_utf8 to the library for transmission

That seems like a bunch of extra work. Is there a way to just stream it? I know I could use IO::Scalar to remove the need to write to disk, but I'd still have to read the whole thing into memory. It seems like there's a way to stream it with a pipeline, but I'm not thinking of a way to do that right now.

like image 255
Kevin G. Avatar asked Oct 03 '14 18:10

Kevin G.


1 Answers

You can write your own conversion module for PerlIO and use it with :via(MODULE). Your module can pass the data through Text::Iconv to convert from one charset to another.

This way is described in the manual PerlIO::via(3pm). In short, you will need to create your own module, e.g. PerlIO::via::Example—that is, you make PerlIO/via directory and put Example.pm there, with the following content:

package PerlIO::via::Example;

use strict;
use warnings;

use Text::Iconv;
my $converter = Text::Iconv->new("windows-1252", "utf-8");

sub PUSHED
{
    my ($class, $mode, $fh) = @_;
    # When writing we buffer the data
    my $buf = '';
    return bless \$buf, $class;
}

sub FILL
{
    my ($obj, $fh) = @_;
    my $line = <$fh>;
    return (defined $line) ? 'converted: ' . $converter->convert($line) : undef;
    # 'converted: ' is added here for debugging purposes
}

sub WRITE
{
    my ($obj,$buf,$fh) = @_;
    $$obj .= $buf; # we do nothing here
    return length($buf);
}

sub FLUSH
{
    my ($obj, $fh) = @_;
    print $fh $$obj or return -1;
    $$obj = '';
    return 0;
}

1;

and then use it in open like here:

use strict;
use warnings;

use PerlIO::via::Example;

open(my $fh, "<:via(Example)", "input.txt");
while (<$fh>) {
    print;
}
close $fh;
like image 158
afenster Avatar answered Nov 07 '22 12:11

afenster