I have two files with same amount of rows and columns. Delimited with ;
. Example;
file_a:
1;1;1;1;1
2;2;2;2;2
3;3;3;3;3
4;4;4;4;4
file_b:
A;A;A;A;A
B;B;;;B
;;;;
D;D;D;D;D
Ignoring delimiters, line 3 is empty from file_b
. So I want to remove line 3 from file_a
as well, before command;
paste -d ';' file_a file_b
.
in order to have an output like this:
1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D
Edit: Number of columns is 93 and same for each row and for both files, so both files have exactly the same matrix of rows and columns.
\s metacharacter is not available in all awk implementations, but you can also write !/^[ \t]*$/ . \s Matches any space character as defined by the current locale. Think of it as shorthand for '[[:space:]]'.
Could you please try following, written and tested with shown samples in GNU awk
.
awk '
BEGIN{
FS=OFS=";"
}
FNR==NR{
arr[FNR]=$0
next
}
!/^;+$/{
print arr[FNR],$0
}
' file_a file_b
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS=";" ##Setting field separator and output field separator as ; here.
}
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when file_a is being read.
arr[FNR]=$0 ##Creating arr with index FNR and value is current line.
next ##next will skip all further statements from here.
}
!/^;+$/{ ##Checking condition if line NOT starting from ; till end then do following.
print arr[FNR],$0 ##Printing arr with index of FNR and current line.
}
' file_a file_b ##Mentioning Input_file names here.
Since you mention that both files have same number of lines, getline
would fit here:
$ awk '(getline line < "f2")==1 && line ~ /[^;]/' f1
1;1;1;1;1
2;2;2;2;2
4;4;4;4;4
And you can do the paste
functionality within awk
as well:
$ awk '(getline line < "f2")==1 && line ~ /[^;]/{print $0 ";" line}' f1
1;1;1;1;1;A;A;A;A;A
2;2;2;2;2;B;B;;;B
4;4;4;4;4;D;D;D;D;D
The return value of getline
is 1
if line was read successfully. line ~ /[^;]
checks if the line contains any non ;
character. If both conditions are satisfied, you can then print the required results.
Basically a modification of @RavinderSingh13's solution but I only store the NR's of the empty records:
$ awk '
NR==FNR { # process the b file
if($0~/^;+$/) # when empty record met
a[NR] # hash the record number NR
next
}
!(FNR in a) # print non-empty matches of a file
' fileb filea
Output:
1;1;1;1;1
2;2;2;2;2
4;4;4;4;4
Filtering after paste
is easier. Assuming the format of the input lines to exclude is exactly as shown in the question, you can filter the output of paste
with a grep
pattern anchored to the end of the line. (5 empty fields at the end of the line)
paste -d ';' file_a file_b | grep -v ';;;;;$'
With the input files shown in the question, this prints exactly the requested output.
Edit:
To fulfill an additional requirement from a comment, the grep
command can be modified to specify the number of semicolons corresponding to the number of empty columns. For different input files, simply change the number 5
accordingly.
paste -d ';' file_a file_b | grep -v ';\{5\}$'
If the number of columns is 93 as now specified in the question, the command would be
paste -d ';' file_a file_b | grep -v ';\{93\}$'
Edit2:
You can also get the required number of semicolons from the first line of file_b
SEMICOLONS=$(head -1 file_b | sed 's/[^;]*//g')
paste -d ';' file_a file_b | grep -v ";$SEMICOLONS"'$'
or combined to
paste -d ';' file_a file_b | grep -v ';'$(head -1 file_b | sed 's/[^;]*//g')'$'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With