Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ignore 'Unclosed Token' in Perl

Tags:

xml

csv

perl

xpath

I have a 2Gb CSV file where column 1 contains the time in epoch and the second column contains a 10000+ line XML file (as a single line).

I want to iterate through every line of this CSV and save the second column XML to a file of its own. I also use XPath to get the customers name from the XML file so I can name the file to [CustomerName]-[time from Column 1].xml. However some of the XML files are not valid XML and I am getting an error that says Unclosed Token on Line .... Is there a way to ignore that message and just have it skip the file? The following is my Perl code:

my $file = '../FILENAME.csv';
open my $info, $file or die "Could not open $file: $!";
my $count = 0;
$| = 1;

while( my $line = <$info>)  {
    $count++; if($count == 1) {next;} #Ignore headers
    $line =~ /(\d+),"(.*?)"$/; #Load time into $1, XML file into $2
    my $time = $1;
    my $report = $2;
    $report =~ s/""/"/g; #Replace "" with "
    my $xp = XML::XPath->new(xml => $report);
    my $ext = $xp->getNodeText('/report/customer') . "-" . $time . ".xml"; #Generate filename with customer name and time
    write_file($ext, $report);
}
close $info;

I am also open to suggestions to make this more efficient.

like image 816
Bijan Avatar asked Sep 29 '22 00:09

Bijan


1 Answers

You can try enclose the troubling codes inside eval. For example:

eval {
  my $xp = XML::XPath->new(xml => $report);
  my $ext = $xp->getNodeText('/report/customer') . "-" . $time . ".xml"; #Generate filename with customer name and time
  write_file($ext, $report);
};
if ( $@ ) {
  printf "ERROR: $@";
}

The following code:

$count++; if($count == 1) {next;} #Ignore headers
$line =~ /(\d+),"(.*?)"$/; #Load time into $1, XML file into $2
my $time = $1;
my $report = $2;

can be shortened to:

next if ++$count == 1; #Ignore headers
my ($time, $report) = ($line =~ /(\d+),"(.*)"$/); # time, XML file
like image 151
tivn Avatar answered Oct 03 '22 08:10

tivn