how to trim file - remove the columns with the same value

Question

I would like your help on trimming a file by removing the columns with the same value.

# the file I have (tab-delimited, millions of columns)
jack 1 5 9
john 3 5 0
lisa 4 5 7

# the file I want (remove the columns with the same value in all lines)
jack 1 9
john 3 0
lisa 4 7

Could you please give me any directions on this problem? I prefer a sed or awk solution, or maybe a perl solution.

Thanks in advance. Best,

jkerian · Accepted Answer

Here's a quick perl script to figure out which columns can be cut.

open FH, "file" or die $!;
my @baseline = split /	/,<FH>;         #snag the first row
my @linemap = 0..$#baseline;            #list all equivalent columns (all of them)

while(<FH>) {                           #loop over the file
    my @line = split /	/;
    @linemap = grep {$baseline[$_] eq $line[$_]}  @linemap; #filter out any that aren't equal
}
print join " ", @linemap;
print "
";

You can use many of the above recommendations to actually remove the columns. My favorite would probably the cut implementation, partly because the above perl script could be modified to give you the precise command (or even run it for you).

@linemap = map {$_+1} @linemap;                   #Cut is 1-index based
print "cut --complement -f ".join(",",@linemap)." file
";

Seth Robertson · Answer

#!/usr/bin/perl
$/="	";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
  next if (($. % 4) == 3);
  print;
}

Well, this was assuming it was the third column. If it is by value:

#!/usr/bin/perl
$/="	";
open(R,"<","/tmp/filename") || die;
while (<R>)
{
  next if (($_ == 5);
  print;
}

With the question edit, OP's desires become clear. How about:

#!/usr/bin/perl
open(R,"<","/tmp/filename") || die;
my $first = 1;
my (@cols);
while (<R>)
{
  my (@this) = split(/	/);
  if ($. == 1)
  {
    @cols = @this;
  }
  else
  {
    for(my $x=0;$x<=$#cols;$x++)
    {
      if (defined($cols[$x]) && !($cols[$x] ~~ $this[$x]))
      {
        $cols[$x] = undef;
      }
    }
  }
  next if (($_ == 5));
#  print;
}
close(R);
my(@del);
print "Deleting columns: ";
for(my $x=0;$x<=$#cols;$x++)
{
  if (defined($cols[$x]))
  {
    print "$x ($cols[$x]), ";
    push(@del,$x-int(@del));
  }
}
print "
";

open(R,"<","/tmp/filename") || die;
while (<R>)
{
  chomp;
  my (@this) = split(/	/);

  foreach my $col (@del)
  {
    splice(@this,$col,1);
  }

  print join("	",@this)."
";
}
close(R);

how to trim file - remove the columns with the same value

Tags:

unix

sed

awk

perl

jianfeng.mao

2 Answers

jkerian

Seth Robertson

Recent Activity

Donate For Us

how to trim file - remove the columns with the same value

Tags:

unix

sed

awk

perl

jianfeng.mao

2 Answers

jkerian

Seth Robertson

Related questions

Recent Activity

Donate For Us