Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiline pattern matching in bash

Tags:

I have a long file of the type

Processin SCRIPT10 file..
Submitted batch job 1715572
Processin SCRIPT100 file..
Processin SCRIPT1000 file..
Submitted batch job 1715574
Processin SCRIPT10000 file..
Processin SCRIPT10001 file..
Processin SCRIPT10002 file..
Submitted batch job 1715577
Processin SCRIPT10003 file..
Submitted batch job 1715578
Processin SCRIPT10004 file..
Submitted batch job 1715579

I want to find out jobs (script names) that were not submitted. That means there is not line submitted batch job right after processing line.

So far I have tried to do that task using

pcregrep -M "Processin.*\n.*Processin" execScripts2.log | awk 'NR % 2 == 0'

But it does not handle properly the situation when multiple scripts does not get processed. It outputs, surprisingly, only SCRIPT1000 and SCRIPT10001 lines. Can you show me a better one-liner?

Ideally the output would be only the lines without 'Submitted' on the next line (or just script names) that means:

SCRIPT100
SCRIPT10000
SCRIPT10001

Thanks.

like image 295
VojtaK Avatar asked May 24 '17 09:05

VojtaK


1 Answers

This awk can do the job:

awk -v s='Submitted' '$1 != s{if(p != "") print p; p=$2} $1 == s{p=""}' file

SCRIPT100
SCRIPT10000
SCRIPT10001

Reference: Effective AWK Programming

like image 144
anubhava Avatar answered Sep 24 '22 10:09

anubhava