Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare consecutive rows and multiple columns in awk and random select one of duplicate lines

Tags:

bash

sed

awk

I read the question: Compare consecutive rows in awk/(or python) and random select one of duplicate lines . Now I have some additional question: How should I change the code, if I want to do this comparison not only for the x-value, but also for the y-value or more columns? Maybe something like

if ($1 != prev) && ($2 != prev)  ???

In other words: I want to compare if the x-value AND the y-value of the current line is the same as the x-value AND the y-value of the next consecutive lines.

The data:

#x   y     z
1    1    11        
10   10   12       
10   10   17       
4    4    14
20   20   15        
20   88   16     
20   99   17
20   20   22
5    5    19
10   10   20

The output should look like:

#x   y     z
1    1    11        
10   10   17       
4    4    14
20   20   15        
20   88   16        
20   99   17    
20   20   22    
5    5    19
10   10   20

or (due to random selection)

#x   y     z
1    1    11        
10   10   12       
4    4    14
20   20   15        
20   88   16        
20   99   17    
20   20   22    
5    5    19
10   10   20

The code from the above link, that does the stuff for the x-values, but NOT for the y-values in an AND condition:

$ cat tst.awk
function prtBuf(        idx) {
    if (cnt > 0) {
        idx = int((rand() * cnt) + 1)
        print buf[idx]
    }
    cnt = 0
}

BEGIN { srand() }
$1 != prev { prtBuf() }
{ buf[++cnt]=$0; prev=$1 }
END { prtBuf() }
like image 643
Jojo Avatar asked Oct 30 '22 23:10

Jojo


1 Answers

This should do it:

function prtBuf(idx) {
    if (cnt > 0) {
        idx = int((rand() * cnt) + 1)
        print buf[idx]
    }
    cnt = 0
}

BEGIN { srand() }
$1 != prev1 || $2 != prev2 { prtBuf() }
{ buf[++cnt]=$0; prev1=$1; prev2=$2 }
END { prtBuf() }
like image 196
Andrzej Pronobis Avatar answered Nov 08 '22 09:11

Andrzej Pronobis