Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to trim file - remove the columns with the same value

Tags:

unix

sed

awk

perl

I would like your help on trimming a file by removing the columns with the same value.

# the file I have (tab-delimited, millions of columns)
jack 1 5 9
john 3 5 0
lisa 4 5 7

# the file I want (remove the columns with the same value in all lines)
jack 1 9
john 3 0
lisa 4 7

Could you please give me any directions on this problem? I prefer a sed or awk solution, or maybe a perl solution.

Thanks in advance. Best,

like image 769
jianfeng.mao Avatar asked Jun 15 '11 19:06

jianfeng.mao


2 Answers

Here's a quick perl script to figure out which columns can be cut.

open FH, "file" or die $!;
my @baseline = split /\t/,<FH>;         #snag the first row
my @linemap = 0..$#baseline;            #list all equivalent columns (all of them)

while(<FH>) {                           #loop over the file
    my @line = split /\t/;
    @linemap = grep {$baseline[$_] eq $line[$_]}  @linemap; #filter out any that aren't equal
}
print join " ", @linemap;
print "\n";

You can use many of the above recommendations to actually remove the columns. My favorite would probably the cut implementation, partly because the above perl script could be modified to give you the precise command (or even run it for you).

@linemap = map {$_+1} @linemap;                   #Cut is 1-index based
print "cut --complement -f ".join(",",@linemap)." file\n";
like image 118
jkerian Avatar answered Nov 08 '22 10:11

jkerian


#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
  next if (($. % 4) == 3);
  print;
}

Well, this was assuming it was the third column. If it is by value:

#!/usr/bin/perl
$/="\t";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
  next if (($_ == 5);
  print;
}

With the question edit, OP's desires become clear. How about:

#!/usr/bin/perl
open(R,"<","/tmp/filename") || die;
my $first = 1;
my (@cols);
while (<R>)
{
  my (@this) = split(/\t/);
  if ($. == 1)
  {
    @cols = @this;
  }
  else
  {
    for(my $x=0;$x<=$#cols;$x++)
    {
      if (defined($cols[$x]) && !($cols[$x] ~~ $this[$x]))
      {
        $cols[$x] = undef;
      }
    }
  }
  next if (($_ == 5));
#  print;
}
close(R);
my(@del);
print "Deleting columns: ";
for(my $x=0;$x<=$#cols;$x++)
{
  if (defined($cols[$x]))
  {
    print "$x ($cols[$x]), ";
    push(@del,$x-int(@del));
  }
}
print "\n";

open(R,"<","/tmp/filename") || die;
while (<R>)
{
  chomp;
  my (@this) = split(/\t/);

  foreach my $col (@del)
  {
    splice(@this,$col,1);
  }

  print join("\t",@this)."\n";
}
close(R);
like image 24
Seth Robertson Avatar answered Nov 08 '22 08:11

Seth Robertson