Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Perl to parse a CSV file from a particular row to the end of the file

am very new to Perl and need your help

I have a CSV file xyz.csv with contents:

here level1 and er values are strings names...not numbers...

level1,er
level2,er2
level3,er3
level4,er4

I parse this CSV file using the script below and pass the fields to an array in the first run

open(my $d, '<', $file) or die "Could not open '$file' $!\n";
while (my $line = <$d>) {
  chomp $line; 
  my @data = split "," , $line; 
  @XYX = ( [ "$data[0]", "$data[1]" ], );
}

For the second run I take an input from a command prompt and store in variable $val. My program should parse the CSV file from the value stored in variable until it reaches the end of the file

For example

I input level2 so I need a script to parse from the second line to the end of the CSV file, ignoring the values before level2 in the file, and pass these values (level2 to level4) to the @XYX = (["$data[1]","$data[1]"],);}

level2,er2
level3,er3
level4,er4

I input level3 so I need a script to parse from the third line to the end of the CSV file, ignoring the values before level3 in the file, and pass these values (level3 and level4) to the @XYX = (["$data[0]","$data[1]"],);}

level3,er3
level4,er4

How do I achieve that? Please do give your valuable suggestions. I appreciate your help

like image 813
hi123 Avatar asked Aug 31 '12 11:08

hi123


3 Answers

As long as you are certain that there are never any commas in the data you should be OK using split. But even so it would be wise to limit the split to two fields, so that you get everything up to the first comma and everything after it

There are a few issues with your code. First of all I hope you are putting use strict and use warnings at the top of all your Perl programs. That simple measure will catch many trivial problems that you could otherwise overlook, and so it is especially important before you ask for help with your code

It isn't commonly known, but putting a newline "\n" at the end of your die string prevent Perl from giving file and line number details in the output of where the error occurred. While this may be what you want, it is usually more helpful to be given the extra information

Your variable names are verly unhelpful, and by convention Perl variables consist of lower-case alphanumerics and underscores. Names like @XYX and $W don't help me understand your code at all!

Rather than splitting to an array, it looks like you would be better off putting the two fields into two scalar variables to avoid all that indexing. And I am not sure what you intend by @XYX = (["$data[1]","$data[1]"],). First of all do you really mean to use $data[1] twice? Secondly, your should never put scalar variables inside double quotes, as it does something very specific, and unless you know what that is you should avoid it. Finally, did you mean to push an anonymous array onto @XYX each time around the loop? Otherwise the contents of the array will be overwritten each time a line is read from the file, and the earlier data will be lost

This program uses a regular expression to extract $level_num from the first field. All it does it find the first sequence of digits in the string, which can then be compared to the minimum required level $min_level to decide whether a line from the log is relevant

use strict;
use warnings;

my $file = 'xyz.csv';
my $min_level = 3;
my @list;

open my $fh, '<', $file or die "Could not open '$file' $!";

while (my $line = <$fh>) {
  chomp $line; 
  my ($level, $error) = split ',', $line, 2;
  my ($level_num) = $level =~ /(\d+)/;
  next unless $level_num >= $min_level;
  push @list, [ $level, $error ];
}
like image 109
Borodin Avatar answered Nov 18 '22 04:11

Borodin


For deciding which records to process you can use the "flip-flop" operator (..) along these lines.

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

my $level = shift || 'level1';

while (<DATA>) {
  if (/^\Q$level,/ .. 0) {
    print;
  }
}

__DATA__
level1,er
level2,er2
level3,er3
level4,er4

The flip-flop operator returns false until its first operand is true. At that point it returns false until its second operand is true; at which point it returns false again.

I'm assuming that your file is ordered so that once you start to process it, you never want to stop. That means that the first operand to the flip-flop can be /^\Q$level,/ (match the string $level at the start of the line) and the second operand can just be zero (as we never want it to stop processing).

I'd also strongly recommend not parsing CSV records using split /,/. That may work on your current data but, in general, the fields in a CSV file are allowed to contain embedded commas which will break this approach. Instead, have a look at Text::CSV or Text::ParseWords (which is included with the standard Perl distribution).

Update: I seem to have got a couple of downvotes on this. It would be great if people would take the time to explain why.

like image 1
Dave Cross Avatar answered Nov 18 '22 03:11

Dave Cross


#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV;

my @XYZ;
my $file = 'xyz.csv';
open my $fh, '<', $file or die "$file: $!\n";

my $level = shift; # get level from commandline
my $getall = not defined $level; # true if level not given on commandline

my $parser = Text::CSV->new({ binary => 1 }); # object for parsing lines of CSV

while (my $row = $parser->getline($fh)) # $row is an array reference containing cells from a line of CSV
{
  if ($getall # if level was not given on commandline, then put all rows into @XYZ
      or      # if level *was* given on commandline, then...
      $row->[0] eq $level .. 0 # ...wait until the first cell in a row equals $level, then put that row and all subsequent rows into @XYZ
     )
  {
    push @XYZ, $row;
  }
}

close $fh;
like image 1
Oktalist Avatar answered Nov 18 '22 04:11

Oktalist