Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing Lines and columns with all zeros

Tags:

bash

unix

awk

perl

How can I delete lines (rows) and columns in a text file that contain all the zeros. For example, I have a file:

1 0 1 0 1
0 0 0 0 0
1 1 1 0 1
0 1 1 0 1
1 1 0 0 0
0 0 0 0 0
0 0 1 0 1  

I want to delete 2nd and 4th line and also the 2nd column. The output should look like:

1 0 1 1 
1 1 1 1 
0 1 1 1 
1 1 0 0 
0 0 1 1 

I can do this using sed and egrep

  sed '/0 0 0 0/d' or egrep -v '^(0 0 0 0 )$'

for lines with zeros but that would too inconvenient for files with thousands of columns. I have no idea how can I remove the column with all zeros, 2nd column here.

like image 422
discipulus Avatar asked Aug 06 '13 11:08

discipulus


People also ask

How do I delete all zero lines in Excel?

(1) Select the Entire row option in the Selection type section. (2) Select Equals in the first Specific type drop-down list, then enter number 0 into the text box. (3) Click the OK button.

How do I remove rows with 0 in a column in R?

For example, if we have a data frame called df then we can remove rows that contain at least one 0 can be done by using the command df[apply(df,1, function(x) all(x!= 0)),].

Can I remove the zero row in Matrix?

To remove the rows of 0 , you can: sum the absolute value of each rows (to avoid having a zero sum from a mix of negative and positive numbers), which gives you a column vector of the row sums. keep the index of each line where the sum is non-zero.


2 Answers

Perl solution. It keeps all the non-zero lines in memory to be printed at the end, because it cannot tell what columns will be non-zero before it processes the whole file. If you get Out of memory, you may only store the numbers of the lines you want to output, and process the file again while printing the lines.

#!/usr/bin/perl
use warnings;
use strict;

my @nonzero;                                       # What columns where not zero.
my @output;                                        # The whole table for output.

while (<>) {
    next unless /1/;
    my @col = split;
    $col[$_] and $nonzero[$_] ||= 1 for 0 .. $#col;
    push @output, \@col;
}

my @columns = grep $nonzero[$_], 0 .. $#nonzero;   # What columns to output.
for my $line (@output) {
    print "@{$line}[@columns]\n";
}
like image 87
choroba Avatar answered Oct 16 '22 21:10

choroba


Rather than storing lines in memory, this version scans the file twice: Once to find the "zero columns", and again to find the "zero rows" and perform the output:

awk '
    NR==1   {for (i=1; i<=NF; i++) if ($i == 0) zerocol[i]=1; next} 
    NR==FNR {for (idx in zerocol) if ($idx) delete zerocol[idx]; next}
    {p=0; for (i=1; i<=NF; i++) if ($i) {p++; break}}
    p {for (i=1; i<=NF; i++) if (!(i in zerocol)) printf "%s%s", $i, OFS; print ""}
' file file
1 0 1 1 
1 1 1 1 
0 1 1 1 
1 1 0 0 
0 0 1 1 

A ruby program: ruby has a nice array method transpose

#!/usr/bin/ruby

def remove_zeros(m)
  m.select {|row| row.detect {|elem| elem != 0}}
end

matrix = File.readlines(ARGV[0]).map {|line| line.split.map {|elem| elem.to_i}}
# remove zero rows
matrix = remove_zeros(matrix)
# remove zero rows from the transposed matrix, then re-transpose the result
matrix = remove_zeros(matrix.transpose).transpose
matrix.each {|row| puts row.join(" ")}
like image 43
glenn jackman Avatar answered Oct 16 '22 22:10

glenn jackman