I read the question: Compare consecutive rows in awk/(or python) and random select one of duplicate lines . Now I have some additional question: How should I change the code, if I want to do this comparison not only for the x-value, but also for the y-value or more columns? Maybe something like
if ($1 != prev) && ($2 != prev) ???
In other words: I want to compare if the x-value AND the y-value of the current line is the same as the x-value AND the y-value of the next consecutive lines.
The data:
#x y z
1 1 11
10 10 12
10 10 17
4 4 14
20 20 15
20 88 16
20 99 17
20 20 22
5 5 19
10 10 20
The output should look like:
#x y z
1 1 11
10 10 17
4 4 14
20 20 15
20 88 16
20 99 17
20 20 22
5 5 19
10 10 20
or (due to random selection)
#x y z
1 1 11
10 10 12
4 4 14
20 20 15
20 88 16
20 99 17
20 20 22
5 5 19
10 10 20
The code from the above link, that does the stuff for the x-values, but NOT for the y-values in an AND condition:
$ cat tst.awk
function prtBuf( idx) {
if (cnt > 0) {
idx = int((rand() * cnt) + 1)
print buf[idx]
}
cnt = 0
}
BEGIN { srand() }
$1 != prev { prtBuf() }
{ buf[++cnt]=$0; prev=$1 }
END { prtBuf() }
This should do it:
function prtBuf(idx) {
if (cnt > 0) {
idx = int((rand() * cnt) + 1)
print buf[idx]
}
cnt = 0
}
BEGIN { srand() }
$1 != prev1 || $2 != prev2 { prtBuf() }
{ buf[++cnt]=$0; prev1=$1; prev2=$2 }
END { prtBuf() }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With