I am trying to make use of the Text::CSV Perl module to be able to parse a tab delimited file.
The file I am trying to parse is:
#IGNORE COLUMN1 COLUMN2 COLUMN3 COLUMN4
ROW1 x y z a
ROW2 b c d
ROW3 w
Note that the file is tab delimited. This file may have N columns and N rows. Also, in the case of ROW2, it has a fourth tab but no value. ROW3 has no tabs after the w value for COLUMN1. I.e. some columns may have undefined values or blank values.
So far, I have began writing a Perl script but have stumbled very early on in trying to figure out how I can write code to answer the following question:
Find out how many ROWn there are. Then for each COLUMNn check to see if I have ROWn values. So in this case, COLUMN2,COLUMN3 and COLUMN4 would have missing values.
Any tips and guidance would help (I'm new to Perl). I've looked at the CPAN Text::CSV page but I've not managed to be able to solve this problem.
#!/usr/bin/perl
use warnings;
use strict;
use v5.12;
use Text::CSV;
my $csv = Text::CSV->new ({
escape_char => '"',
sep_char => '\t',
eol => $\,
binary => 1,
blank_is_undef => 1,
empty_is_undef => 1,
});
open (my $file, "<", "tabfile.txt") or die "cannot open: $!";
while (my $row = $csv->getline ($file)) {
say @$row[0];
}
close($file);
One approach where for each line process every field and increment a counter when it's not false:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
my (@col_counter);
my ($line_counter, $r, $num_cols) = (0, 0, 0);
open my $fh, '<', shift or die;
my $csv = Text::CSV_XS->new({
sep_char => qq|\t|
});
while ( my $row = $csv->getline( $fh ) ) {
## First row (header), get the number of columns.
if ( $line_counter == 0 ) {
$num_cols = $#$row;
next;
}
## For each data row, traverse every column and increment a
## counter if it has any value.
for ( 1 .. $#$row ) {
++$col_counter[ $_ ] if $row->[ $_ ];
}
}
continue {
$line_counter++;
}
printf qq|Lines of file: %d\n|, $line_counter - 1;
## Check if any column has missing values. For each column compare the
## number of lines read (substract 1 of header) with its counter. If they
## are different it means that the column had any row without value.
for my $i ( 1 .. $num_cols ) {
$r = $line_counter - 1 - (defined $col_counter[ $i ] ? $col_counter[ $i ] : 0);
if ( $r > 0 ) {
printf qq|Column %d has %d missing values\n|, $i, $r;
}
}
With your example data, run it like:
perl script.pl infile
That yields:
Lines of file: 3
Column 2 has 1 missing values
Column 3 has 1 missing values
Column 4 has 2 missing values
UPDATE: See comments. I do the reverse looking for columns that don't contain any value and appends current line number to an array, and use join to extract all lines.
What parts did I change? Here to save line numbers.
for ( 1 .. $num_cols ) {
push @{ $col_counter[ $_ ] }, $line_counter unless $row->[ $_ ];
}
And here to print them. You will need to comment the old behaviour.
if ( defined $col_counter[ $i ] ) {
printf qq|Column %d has no value in lines %s\n|, $i, join q|,|, @{ $col_counter[ $i ] };
}
It yields:
Lines of file: 3
Column 2 has no value in lines 3
Column 3 has no value in lines 3
Column 4 has no value in lines 2,3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With