Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed: print delimited block of lines if it matches a pattern

Tags:

bash

sed

awk

I'd like to use sed to match blocks of lines delimited by pattern1/pattern2, and then perform operations (e.g. print the block) only on blocks which contain pattern3.

In the example below, I'm looking for "catch me if you can", inside all blocks delimited by lines matching { and } (and then I want to print the matching blocks in their entirety).

What I've tried:

sed -n -e '/{/,/}/{1h;1!{$!{H;d};H;x;/catch me if you can/p}}'

(The idea is to match blocks delimited by { and }, then accumulate each block into the hold space; at the end of each block, exchange the hold space and perform matching for "catch me if you can"). This doesn't work, because all matched blocks together are treated as a single block by sed, instead of each block being treated individually.

Input data:

"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block2": {
    "bbb": "24680",
    "bar": "blah",
    "foo": "argh",
    "ccc": "135"
},
"block3": {
    "ddd": "zzz"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

Desired output:

"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can"
},

Note 1: The order of the fields inside each block is random. The number of fields and the length of the values are not constant across blocks. The field I'm looking for may be missing in some blocks (as opposed to just having a different value).

Note 2: For educational purposes, I'd prefer the solution to use sed, but if that's not possible, awk or bash are fine as well. Please no perl or other tools.

References:

  1. Sed command summary
  2. Sed one liners
like image 504
Sir Athos Avatar asked May 20 '16 23:05

Sir Athos


2 Answers

This is how I'd do it. There are two versions here, one for BSD (Mac OS X) sed (also applicable to other systems not running GNU sed), and one for GNU sed.

BSD sed

$ cat script.bsd-sed
/{/,/}/{
    /{/{ h; b next
    }
    /}/{ H; x; /catch me if you can/p; b next
    }
    H
    :next
}
$ sed -n -f script.bsd-sed data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

The logic is:

  • Don't print anything unless told to do so (-n).
  • Between lines containing { and }
  • If the line matches {, copy the pattern over the hold space and jump to label next.
  • If the line matches }, add it to the hold space; switch the pattern and hold space; if the pattern space (previously hold space) matches your other pattern ('catch me if you can'), print it; jump to label next.
  • Add the line to the hold space.

BSD (classic) sed requires nothing on the line after b next, so the } for the actions are on the next line.

GNU sed

$ cat script.gnu-sed 
/{/,/}/{
    /{/{ h; b next }
    /}/{ H; x; /catch me if you can/p; b next }
    H
    :next
}
$ /opt/gnu/bin/sed -n -f script.gnu-sed data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

GNU sed recognizes semicolons or close braces after the label as terminating the command, so it allows more compact notation. You could even flatten it all into a single line — you have to add a few semicolons:

$ /opt/gnu/bin/sed -n -e '/{/,/}/{ /{/{ h; b next }; /}/{ H; x; /catch me if you can/p; b next }; H; :next }' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

You can remove the spaces not in the pattern match too:

$ /opt/gnu/bin/sed -n -e '/{/,/}/{/{/{ h;b next};/}/{H;x;/catch me if you can/p;b next};H;:next}' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
$

Extended data file data

"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block2": {
    "bbb": "24680",
    "bar": "blah",
    "foo": "argh",
    "ccc": "135"
},
"block3": {
    "ddd": "zzz"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
"block5": [
    "oops": "catch me if you can"
],
"block6": {
    "rhubarb": "dandelion"
}
like image 183
Jonathan Leffler Avatar answered Oct 03 '22 01:10

Jonathan Leffler


Using sed

$ sed -n '/^"/{x;/catch/p;d}; ${H;x;/catch/p;d}; H' file
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

How it works

  • -n

    This option tells sed not to print anything unless we ask

  • /^"/{x;/catch/p;d}

    For any line that begins with a quote, this (1) exchanges the pattern and hold space, (2) checks to see if what is now in the pattern space has catch in it and, if so, prints it, and (3) deletes the pattern space and sed starts over working on the next line.

  • ${H;x;/catch/p;d}

    When we reach the last line, we do something similar. We add the last line to the hold space, swap the hold space into the pattern space, check to see if it contains catch and, if so, prints it. Then the pattern space is deleted.

  • H

    For any other case, the line is appended to the hold space.

Using awk

$ awk '/catch/{print $0 "},"}' RS='}' file
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
,
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
},

Improvements

Jonathan Leffler adds the possibility of square bracket blocks in addition to curly brace blocks as shown in his test file data. In that case for sed, try:

$ sed -n '/^"/{x;/{.*catch/p;d}; ${H;x;/{.*catch/p;d}; H' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}

And for awk:

$ awk '{s=(s?s"\n":"") $0} /{/{f=1} f && /catch/{f=2} /^[]}]/{if (f==2) print s; f=0; s=""} ' data
"block1": {
    "foo": "abcd",
    "bar": "catch me if you can",
    "aaa": "12345"
},
"block4": {
    "foo": "xyz",
    "bar": "catch me if you can",
}
like image 25
John1024 Avatar answered Oct 03 '22 00:10

John1024