Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using perl tie::file with utf encoded file

Tags:

perl

Can I use Tie::File with an output file of utf encoding? I can't get this to work right. What I am trying to do is open this utf encoded file, remove the match string from the file and rename the file.

Code:

use strict;
use warnings;
use Tie::File;
use File::Copy;

my ($input_file) = qw (test.txt);

open my $infh, "<:encoding(UTF-16LE)", $input_file or die "cannot open '$input_file': $!";

for (<$infh>) {
    tie my @lines, "Tie::File", $_;
    shift @lines if $lines[0] =~ m/MyHeader/;
    untie @lines;
    my ($name) = /^(.*).csv/i;
    move($_, $name . ".dat");
}

close $infh
    or die "Cannot close '$input_file': $!";

Code: (updated)

my ($input_file) = qw (test.txt);
my $qfn_in = $input_file;
my $qfn_out = $qfn_in . ".dat";

open(my $fh_in, "<:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_in)
   or die("Can't open \"$qfn_in\": $!\n");

open(my $fh_out, ">:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_out)
   or die("Can't open \"$qfn_out\": $!\n");

while (<$fh_in>) {
   next if $. == 1 && /MyHeader/; 
   print($fh_out $_)
      or die("Can't write to \"$qfn_out\": $!");
}

close($fh_in);
close($fh_out) or die("Can't write to \"$qfn_out\": $!");

rename($qfn_out, $qfn_in)
   or die("Can't rename: $!\n");
like image 204
jdamae Avatar asked Dec 04 '25 19:12

jdamae


2 Answers

This is underdocumented in the Tie::File perldoc, but you want to pass the discipline => ':encoding(UTF-16LE)' option when you tie the file:

tie my @lines, 'Tie::File', $input_file, discipline => ':encoding(UTF-16LE)'

Note that the third argument is the name of the file to associate with the tied array. Tie::File will automatically open and manage the filehandle for you; there is no need to call open on the file yourself.

@lines now contains the contents of the file, so the next thing to do is check the first line:

if ($lines[0] =~ m/pattern/) {
    my $line = shift @lines;
    untie @lines;   # rewrites, closes the file, w/o first line
    my ($name) = $line =~ /^(.*).csv/i;
    rename $input_file, "$name.dat";
}

But I concur with TLP that Tie::File is overkill for this job.

(My previous answer about opening a filehandle with the correct encoding and passing the glob as the third arg to Tie::File won't work, as (1) it didn't open the file in read/write mode and (2) even if it did, Tie::File can't or doesn't apply the encoding on both the reading from and writing to the file handle)

like image 56
mob Avatar answered Dec 06 '25 11:12

mob


my $qfn_in = ...;
my $qfn_out = $qfn_in . ".tmp";

open(my $fh_in, "<:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_in)
   or die("Can't open \"$qfn_in\": $!\n");

open(my $fh_out, ">:raw:perlio:encoding(UTF-16le):crlf:utf8", $qfn_out)
   or die("Can't open \"$qfn_out\": $!\n");

while (<$fh_in>) {
   next if $. == 1 && /MyHeader/;
   print($fh_out $_)
      or die("Can't write to \"$qfn_out\": $!");
}

close($fh_in);
close($fh_out) or die("Can't write to \"$qfn_out\": $!");

rename($qfn_out, $qfn_in)
   or die("Can't rename: $!\n");

(:perlio and :utf8 are workarounds for bugs that existed back then.)

like image 35
ikegami Avatar answered Dec 06 '25 11:12

ikegami



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!