I would like your help on trimming a file by removing the columns with the same value.
# the file I have (tab-delimited, millions of columns)
jack 1 5 9
john 3 5 0
lisa 4 5 7
# the file I want (remove the columns with the same value in all lines)
jack 1 9
john 3 0
lisa 4 7
Could you please give me any directions on this problem? I prefer a sed or awk solution, or maybe a perl solution.
Thanks in advance. Best,
Here's a quick perl script to figure out which columns can be cut.
open FH, "file" or die $!;
my @baseline = split /\t/,<FH>; #snag the first row
my @linemap = 0..$#baseline; #list all equivalent columns (all of them)
while(<FH>) { #loop over the file
my @line = split /\t/;
@linemap = grep {$baseline[$_] eq $line[$_]} @linemap; #filter out any that aren't equal
}
print join " ", @linemap;
print "\n";
You can use many of the above recommendations to actually remove the columns. My favorite would probably the cut implementation, partly because the above perl script could be modified to give you the precise command (or even run it for you).
@linemap = map {$_+1} @linemap; #Cut is 1-index based
print "cut --complement -f ".join(",",@linemap)." file\n";
#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
next if (($. % 4) == 3);
print;
}
Well, this was assuming it was the third column. If it is by value:
#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
next if (($_ == 5);
print;
}
With the question edit, OP's desires become clear. How about:
#!/usr/bin/perl
open(R,"<","/tmp/filename") || die;
my $first = 1;
my (@cols);
while (<R>)
{
my (@this) = split(/\t/);
if ($. == 1)
{
@cols = @this;
}
else
{
for(my $x=0;$x<=$#cols;$x++)
{
if (defined($cols[$x]) && !($cols[$x] ~~ $this[$x]))
{
$cols[$x] = undef;
}
}
}
next if (($_ == 5));
# print;
}
close(R);
my(@del);
print "Deleting columns: ";
for(my $x=0;$x<=$#cols;$x++)
{
if (defined($cols[$x]))
{
print "$x ($cols[$x]), ";
push(@del,$x-int(@del));
}
}
print "\n";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
chomp;
my (@this) = split(/\t/);
foreach my $col (@del)
{
splice(@this,$col,1);
}
print join("\t",@this)."\n";
}
close(R);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With