Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk / sed: replace all fields if any field matches a pattern

I have a tab-delimited file with at least 16 (but might be more) columns, where the first column is an unique identifier; and >10,000 rows (only 6x6 shown in example), like this:

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    4    4     4     -9    4
5    5    5     5     5     5
6    6    -9    6     6     6

I need to change all values of VAR1-5 into "-9" if one of the values is already "-9"

So, the desired output would be:

ID  VAR1  VAR2  VAR3  VAR4  VAR5
1    1    1     1     1     1
2    -9   -9    -9    -9    -9
3    3    3     3     3     3
4    -9   -9    -9    -9    -9
5    5    5     5     5     5
6    -9   -9    -9    -9    -9

So far, I've tried doing this in awk like this:

awk -F'\t' '
BEGIN{OFS="\t"}
{for(i=2;i<=NF;i++){if ($i=="-9"){for(j=2;j<=NF;j++){$j="-9"};continue}}};1
' < file1.tab

Which works, but is very slow when applied to the actual dataset. Is there a faster way to do this? Perhaps something with a combination of grep and sed?

like image 986
Hill Avatar asked Jan 02 '23 22:01

Hill


1 Answers

Here's a variation which doesn't hard-code the number of columns.

awk -F '\t' '/(^|\t)-9(\t|$)/ {
    printf $1; for(i=2; i<=NF; ++i) printf "\t-9"; printf "\n"
    next }
  1' file1 file2

The main optimization here is that Awk scans the entire line at once and triggers on the regex immediately, without needing to loop over all the fields unless it already knows that there is a match.

Because we know we will ditch all the fields except the first, there is no need to have Awk replace the fields so that it can then print them. Just generate the output we want to print and move on without touching Awk's internal representation of the line. This should buy a couple of cycles as well, though this is a very minor performance improvement.

like image 163
tripleee Avatar answered Jan 05 '23 15:01

tripleee