I have the following text file for which I need to compare values from each line, namely items 2-4 against items 5-7. I'm stuck with bash/awk/sed on this one.
Sample data:
[hartford tmp]$ cat flist
a1 1 2 3 x y z
b1 3 2 1 z y x
c1 1 2 3 1 2 3
d1 4 5 6 6 5 4
e1 a b c a b c
f1 x y z x y z
It works with the following script but its just unbearably slow, probably because all of the echo
s.
[hartford tmp]$ cat pdelta.sh
#!/bin/bash
cat flist |while read rec; do
f1="$(echo $rec | awk '{ print $1 }')"
f2="$(echo $rec | awk '{ print $2 }')"
f3="$(echo $rec | awk '{ print $3 }')"
f4="$(echo $rec | awk '{ print $4 }')"
f5="$(echo $rec | awk '{ print $5 }')"
f6="$(echo $rec | awk '{ print $6 }')"
f7="$(echo $rec | awk '{ print $7 }')"
if [[ "x${f2} x${f3} x${f4}" != "x${f5} x${f6} x${f7}" ]]; then
echo "$f1 DOES NOT MATCH"
fi
done
When run, the output is exactly what I'm looking for but it's too slow when dealing with a file that's 50k+ lines long.
[hartford]$ ./pdelta.sh
a1 DOES NOT MATCH
b1 DOES NOT MATCH
d1 DOES NOT MATCH
What is a more efficient way to accomplish this?
You can use awk
to output all the matching id's:
awk '{ if ($2 == $5 && $3 == $6 && $4 == $7) { print $1 } }' < flist
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With