I have a 2Gb CSV file where column 1 contains the time in epoch and the second column contains a 10000+ line XML file (as a single line).
I want to iterate through every line of this CSV and save the second column XML to a file of its own. I also use XPath to get the customers name from the XML file so I can name the file to [CustomerName]-[time from Column 1].xml
. However some of the XML files are not valid XML and I am getting an error that says Unclosed Token on Line ...
. Is there a way to ignore that message and just have it skip the file? The following is my Perl code:
my $file = '../FILENAME.csv';
open my $info, $file or die "Could not open $file: $!";
my $count = 0;
$| = 1;
while( my $line = <$info>) {
$count++; if($count == 1) {next;} #Ignore headers
$line =~ /(\d+),"(.*?)"$/; #Load time into $1, XML file into $2
my $time = $1;
my $report = $2;
$report =~ s/""/"/g; #Replace "" with "
my $xp = XML::XPath->new(xml => $report);
my $ext = $xp->getNodeText('/report/customer') . "-" . $time . ".xml"; #Generate filename with customer name and time
write_file($ext, $report);
}
close $info;
I am also open to suggestions to make this more efficient.
You can try enclose the troubling codes inside eval
. For example:
eval {
my $xp = XML::XPath->new(xml => $report);
my $ext = $xp->getNodeText('/report/customer') . "-" . $time . ".xml"; #Generate filename with customer name and time
write_file($ext, $report);
};
if ( $@ ) {
printf "ERROR: $@";
}
The following code:
$count++; if($count == 1) {next;} #Ignore headers
$line =~ /(\d+),"(.*?)"$/; #Load time into $1, XML file into $2
my $time = $1;
my $report = $2;
can be shortened to:
next if ++$count == 1; #Ignore headers
my ($time, $report) = ($line =~ /(\d+),"(.*)"$/); # time, XML file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With