Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove lines from a file corresponding to blank lines of another file

I have two files with same amount of rows and columns. Delimited with ;. Example;

file_a:

1;1;1;1;1
2;2;2;2;2
3;3;3;3;3
4;4;4;4;4

file_b:

A;A;A;A;A
B;B;;;B
;;;;
D;D;D;D;D

Ignoring delimiters, line 3 is empty from file_b. So I want to remove line 3 from file_a as well, before command;

paste -d ';' file_a file_b.

in order to have an output like this:

1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D

Edit: Number of columns is 93 and same for each row and for both files, so both files have exactly the same matrix of rows and columns.

like image 938
Ahmet Said Akbulut Avatar asked Sep 24 '20 07:09

Ahmet Said Akbulut


People also ask

How do I skip blank lines in awk?

\s metacharacter is not available in all awk implementations, but you can also write !/^[ \t]*$/ . \s Matches any space character as defined by the current locale. Think of it as shorthand for '[[:space:]]'.


4 Answers

Could you please try following, written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS=OFS=";"
}
FNR==NR{
  arr[FNR]=$0
  next
}
!/^;+$/{
  print arr[FNR],$0
}
' file_a file_b

Explanation: Adding detailed explanation for above.

awk '                 ##Starting awk program from here.
BEGIN{                ##Starting BEGIN section from here.
  FS=OFS=";"          ##Setting field separator and output field separator as ; here.
}
FNR==NR{              ##Checking condition if FNR==NR which will be TRUE when file_a is being read.
  arr[FNR]=$0         ##Creating arr with index FNR and value is current line.
  next                ##next will skip all further statements from here.
}
!/^;+$/{              ##Checking condition if line NOT starting from ; till end then do following.
  print arr[FNR],$0   ##Printing arr with index of FNR and current line.
}
' file_a file_b       ##Mentioning Input_file names here.
like image 112
RavinderSingh13 Avatar answered Oct 20 '22 01:10

RavinderSingh13


Since you mention that both files have same number of lines, getline would fit here:

$ awk '(getline line < "f2")==1 && line ~ /[^;]/' f1
1;1;1;1;1
2;2;2;2;2
4;4;4;4;4

And you can do the paste functionality within awk as well:

$ awk '(getline line < "f2")==1 && line ~ /[^;]/{print $0 ";" line}' f1
1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D

The return value of getline is 1 if line was read successfully. line ~ /[^;] checks if the line contains any non ; character. If both conditions are satisfied, you can then print the required results.

like image 37
Sundeep Avatar answered Oct 20 '22 00:10

Sundeep


Basically a modification of @RavinderSingh13's solution but I only store the NR's of the empty records:

$ awk '
NR==FNR {            # process the b file
    if($0~/^;+$/)    # when empty record met
        a[NR]        # hash the record number NR
    next
}
!(FNR in a)          # print non-empty matches of a file
' fileb filea

Output:

1;1;1;1;1
2;2;2;2;2
4;4;4;4;4
like image 3
James Brown Avatar answered Oct 20 '22 00:10

James Brown


Filtering after paste is easier. Assuming the format of the input lines to exclude is exactly as shown in the question, you can filter the output of paste with a grep pattern anchored to the end of the line. (5 empty fields at the end of the line)

paste -d ';' file_a file_b | grep -v ';;;;;$'

With the input files shown in the question, this prints exactly the requested output.

Edit:
To fulfill an additional requirement from a comment, the grep command can be modified to specify the number of semicolons corresponding to the number of empty columns. For different input files, simply change the number 5 accordingly.

paste -d ';' file_a file_b | grep -v ';\{5\}$'

If the number of columns is 93 as now specified in the question, the command would be

paste -d ';' file_a file_b | grep -v ';\{93\}$'

Edit2:
You can also get the required number of semicolons from the first line of file_b

SEMICOLONS=$(head -1 file_b | sed 's/[^;]*//g')
paste -d ';' file_a file_b | grep -v ";$SEMICOLONS"'$'

or combined to

paste -d ';' file_a file_b | grep -v ';'$(head -1 file_b | sed 's/[^;]*//g')'$'
like image 3
Bodo Avatar answered Oct 20 '22 00:10

Bodo