I am trying to make use of the Text::CSV
Perl module to be able to parse a tab delimited file.
The file I am trying to parse is:
#IGNORE COLUMN1 COLUMN2 COLUMN3 COLUMN4
ROW1 x y z a
ROW2 b c d
ROW3 w
Note that the file is tab delimited. This file may have N
columns and N
rows. Also, in the case of ROW2
, it has a fourth tab but no value. ROW3
has no tabs after the w
value for COLUMN1
. I.e. some columns may have undefined values or blank values.
So far, I have began writing a Perl script but have stumbled very early on in trying to figure out how I can write code to answer the following question:
Find out how many ROWn
there are. Then for each COLUMNn
check to see if I have ROWn
values. So in this case, COLUMN2
,COLUMN3
and COLUMN4
would have missing values.
Any tips and guidance would help (I'm new to Perl). I've looked at the CPAN Text::CSV page but I've not managed to be able to solve this problem.
#!/usr/bin/perl
use warnings;
use strict;
use v5.12;
use Text::CSV;
my $csv = Text::CSV->new ({
escape_char => '"',
sep_char => '\t',
eol => $\,
binary => 1,
blank_is_undef => 1,
empty_is_undef => 1,
});
open (my $file, "<", "tabfile.txt") or die "cannot open: $!";
while (my $row = $csv->getline ($file)) {
say @$row[0];
}
close($file);
One approach where for each line process every field and increment a counter when it's not false
:
#!/usr/bin/env perl
use warnings;
use strict;
use Text::CSV_XS;
my (@col_counter);
my ($line_counter, $r, $num_cols) = (0, 0, 0);
open my $fh, '<', shift or die;
my $csv = Text::CSV_XS->new({
sep_char => qq|\t|
});
while ( my $row = $csv->getline( $fh ) ) {
## First row (header), get the number of columns.
if ( $line_counter == 0 ) {
$num_cols = $#$row;
next;
}
## For each data row, traverse every column and increment a
## counter if it has any value.
for ( 1 .. $#$row ) {
++$col_counter[ $_ ] if $row->[ $_ ];
}
}
continue {
$line_counter++;
}
printf qq|Lines of file: %d\n|, $line_counter - 1;
## Check if any column has missing values. For each column compare the
## number of lines read (substract 1 of header) with its counter. If they
## are different it means that the column had any row without value.
for my $i ( 1 .. $num_cols ) {
$r = $line_counter - 1 - (defined $col_counter[ $i ] ? $col_counter[ $i ] : 0);
if ( $r > 0 ) {
printf qq|Column %d has %d missing values\n|, $i, $r;
}
}
With your example data, run it like:
perl script.pl infile
That yields:
Lines of file: 3
Column 2 has 1 missing values
Column 3 has 1 missing values
Column 4 has 2 missing values
UPDATE: See comments. I do the reverse looking for columns that don't contain any value and appends current line number to an array, and use join
to extract all lines.
What parts did I change? Here to save line numbers.
for ( 1 .. $num_cols ) {
push @{ $col_counter[ $_ ] }, $line_counter unless $row->[ $_ ];
}
And here to print them. You will need to comment the old behaviour.
if ( defined $col_counter[ $i ] ) {
printf qq|Column %d has no value in lines %s\n|, $i, join q|,|, @{ $col_counter[ $i ] };
}
It yields:
Lines of file: 3
Column 2 has no value in lines 3
Column 3 has no value in lines 3
Column 4 has no value in lines 2,3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With