I'd like to use sed to match blocks of lines delimited by pattern1/pattern2, and then perform operations (e.g. print the block) only on blocks which contain pattern3.
In the example below, I'm looking for "catch me if you can", inside all blocks delimited by lines matching { and } (and then I want to print the matching blocks in their entirety).
What I've tried:
sed -n -e '/{/,/}/{1h;1!{$!{H;d};H;x;/catch me if you can/p}}'
(The idea is to match blocks delimited by { and }, then accumulate each block into the hold space; at the end of each block, exchange the hold space and perform matching for "catch me if you can"). This doesn't work, because all matched blocks together are treated as a single block by sed, instead of each block being treated individually.
Input data:
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block2": {
"bbb": "24680",
"bar": "blah",
"foo": "argh",
"ccc": "135"
},
"block3": {
"ddd": "zzz"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
Desired output:
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can"
},
Note 1: The order of the fields inside each block is random. The number of fields and the length of the values are not constant across blocks. The field I'm looking for may be missing in some blocks (as opposed to just having a different value).
Note 2: For educational purposes, I'd prefer the solution to use sed, but if that's not possible, awk or bash are fine as well. Please no perl or other tools.
References:
This is how I'd do it. There are two versions here, one for BSD (Mac OS X) sed
(also applicable to other systems not running GNU sed
), and one for GNU sed
.
sed
$ cat script.bsd-sed
/{/,/}/{
/{/{ h; b next
}
/}/{ H; x; /catch me if you can/p; b next
}
H
:next
}
$ sed -n -f script.bsd-sed data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
$
The logic is:
-n
).{
and }
{
, copy the pattern over the hold space and jump to label next
.}
, add it to the hold space; switch the pattern and hold space; if the pattern space (previously hold space) matches your other pattern ('catch me if you can'), print it; jump to label next
.BSD (classic) sed
requires nothing on the line after b next
, so the }
for the actions are on the next line.
sed
$ cat script.gnu-sed
/{/,/}/{
/{/{ h; b next }
/}/{ H; x; /catch me if you can/p; b next }
H
:next
}
$ /opt/gnu/bin/sed -n -f script.gnu-sed data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
$
GNU sed
recognizes semicolons or close braces after the label as terminating the command, so it allows more compact notation. You could even flatten it all into a single line — you have to add a few semicolons:
$ /opt/gnu/bin/sed -n -e '/{/,/}/{ /{/{ h; b next }; /}/{ H; x; /catch me if you can/p; b next }; H; :next }' data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
$
You can remove the spaces not in the pattern match too:
$ /opt/gnu/bin/sed -n -e '/{/,/}/{/{/{ h;b next};/}/{H;x;/catch me if you can/p;b next};H;:next}' data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
$
data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block2": {
"bbb": "24680",
"bar": "blah",
"foo": "argh",
"ccc": "135"
},
"block3": {
"ddd": "zzz"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
"block5": [
"oops": "catch me if you can"
],
"block6": {
"rhubarb": "dandelion"
}
$ sed -n '/^"/{x;/catch/p;d}; ${H;x;/catch/p;d}; H' file
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
-n
This option tells sed not to print anything unless we ask
/^"/{x;/catch/p;d}
For any line that begins with a quote, this (1) exchanges the pattern and hold space, (2) checks to see if what is now in the pattern space has catch
in it and, if so, prints it, and (3) deletes the pattern space and sed starts over working on the next line.
${H;x;/catch/p;d}
When we reach the last line, we do something similar. We add the last line to the hold space, swap the hold space into the pattern space, check to see if it contains catch
and, if so, prints it. Then the pattern space is deleted.
H
For any other case, the line is appended to the hold space.
$ awk '/catch/{print $0 "},"}' RS='}' file
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
,
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
},
Jonathan Leffler adds the possibility of square bracket blocks in addition to curly brace blocks as shown in his test file data
. In that case for sed, try:
$ sed -n '/^"/{x;/{.*catch/p;d}; ${H;x;/{.*catch/p;d}; H' data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
And for awk:
$ awk '{s=(s?s"\n":"") $0} /{/{f=1} f && /catch/{f=2} /^[]}]/{if (f==2) print s; f=0; s=""} ' data
"block1": {
"foo": "abcd",
"bar": "catch me if you can",
"aaa": "12345"
},
"block4": {
"foo": "xyz",
"bar": "catch me if you can",
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With