Let's say I have a file with several million lines, organized like this:
@1:N:0:ABC
XYZ
@1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA
line from the first line is found in the second line.
When I try to use grep -A1 -P
and pipe the matches with a match like '(?<=:)[A-Z]{3}'
, I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
@1:N:0:ABC
ABC
-F:
use :
as delimiter, makes it easy to get last columns=$NF; p=$0
save last column value and entire line for printing laterNF==1
if line doesn't contain :
$0 ~ s
if line contains the last column data saved previously
index($0,s)
instead to search literally:
followed by line which doesn't have :
With GNU sed
(might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
@1:N:0:ABC
ABC
/:/
if line contains :
N
add next line to pattern space/.*:(.*)\n.*\1/
capture string after last :
and check if it is present in next lineagain, this assumes input like shown in question.. this won't work for cases like
@1:N:0:ABC
@1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n
to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With