Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grab nth occurrence in between two patterns using awk or sed

Tags:

shell

sed

awk

I have an issue where I want to parse through the output from a file and I want to grab the nth occurrence of text in between two patterns preferably using awk or sed

category
1
s
t
done
category
2
n
d
done
category
3
r
d
done
category
4
t
h
done

Let's just say for this example I want to grab the third occurrence of text in between category and done, essentially the output would be

category
3
r
d
done
like image 553
Dan Lawless Avatar asked Nov 08 '12 02:11

Dan Lawless


3 Answers

This might work for you (GNU sed):

'sed -n '/category/{:a;N;/done/!ba;x;s/^/x/;/^x\{3\}$/{x;p;q};x}' file

Turn off automatic printing by using the -n option. Gather up lines between category and done. Store a counter in the hold space and when it reaches 3 print the collection in the pattern space and quit.

Or if you prefer awk:

awk  '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}'  file
like image 200
potong Avatar answered Oct 23 '22 19:10

potong


Try doing this :

 awk -v n=3 '/^category/{l++} (l==n){print}' file.txt

Or more cryptic :

awk -v n=3 '/^category/{l++} l==n' file.txt

If your file is big :

awk -v n=3 '/^category/{l++} l>n{exit} l==n' file.txt
like image 7
Gilles Quenot Avatar answered Oct 23 '22 21:10

Gilles Quenot


If your file doesn't contain any null characters, here's on way using GNU sed. This will find the third occurrence of a pattern range. However, you can easily modify this to get any occurrence you'd like.

sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt

Results:

category
3
r
d
done

Explanation:

Turn off default printing with the -n switch. Match the word 'category' at the start of a line. Swap the pattern space with the hold space and append a null character to the start of the pattern. In the example, if the pattern then contains two leading null characters, pull the pattern out of holdspace. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed will quit. If it's not found sed will continue to read the next line of input in and continue in its loop.

like image 2
Steve Avatar answered Oct 23 '22 20:10

Steve