Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Invalid argument" when using 3 part open in perl

Tags:

encoding

perl

I'm extremely new to perl (and programming, for that matter) so I'm sorry if this is just a stupid mistake.

I'm trying to write a script that pulls a list of files from a .txt file, opens each one, looks for lines that match some regex, and prints those lines to a new file in a structure that will make a valid .csv file (using the capture groups in the regex).

My script works for English UTF-8 files, but when it tries to process non-English files the text data appears with spaces between each letter and the regex doesn't match - I'm guessing this is because they're saved in UTF-16. My thinking was to make the open command three parts, so that it also uses the ":encoding(UTF-16)" parameter for non-English files, but that's resulted in an invalid argument error. In fact, I can't get the script to run at all without using a two-part open command.

Here's my script.

use 5.010;
use strict;
use warnings;

use File::Slurp;

my @intfilelist = read_file('filelist_int.txt');

unlink "int_temp.csv";

foreach my $intfile (@intfilelist) {
    open (my $file, "<:encoding(UTF-16)", $intfile) or die "Whoops! $!";
    while (my $line = <$file>) {
        if ($line =~ m/^(\d{3,5})\t(.*)$/) {
            chomp $line;
            open (my $csv, ">>", "int_temp.csv");
            print $csv ("\"$intfile\",\"$1\",\"$2\"\n");
            close $csv;
        }
    }
}

Changing open (my $file, "<:encoding(UTF-16)", $intfile) to open (my $file, $intfile) causes the script to work, except for the aforementioned issues with non-English files.

Like I said, I've only been playing with perl for 2 days, so sorry if I've misused some terminology or overlooked something obvious. Appreciate any help!

like image 724
nkaun Avatar asked Jan 16 '14 21:01

nkaun


1 Answers

Remove the newline at the end of the filenames that you read from the first file with File::Slurp. You can do this with chomp $intfile; right before the open.

chomp (see Perldoc Chomp) removes newlines from the end of a given string.

like image 101
marderh Avatar answered Nov 11 '22 16:11

marderh