I am trying to use awk to parse a multiline expression. A single one of them looks like this:
_begin hello world !
_attrib0 123
_attrib1 super duper
_attrib1 yet another value
_attrib2 foo
_end
I need to extract the value associated to _begin and _attrib1. So in the example, the awk script should return (one per line):
hello world ! super duper yet another value
The separator used is a tab (\t) character. Spaces are used only within strings.
The following awk script does the job:
#!/usr/bin/awk -f
BEGIN { FS="\t"; }
/^_begin/ { output=$2; }
$1=="_attrib1" { output=output " " $2; }
/^_end/ { print output; }
You didn't specify whether you want a tab (\t
) to be your output field separator. If you do, let me know and I'll update the answer. (Or you can; it's trivial.)
Of course, if you want a scary alternative (since we're getting close to Hallowe'en), here a solution using sed
:
$ sed -ne '/^_begin./{s///;h;};/^_attrib1[^0-9]/{s///;H;x;s/\n/ /;x;};/^_end/{;g;p;}' input.txt
hello world ! super duper yet another value
How does this work? Mwaahahaa, I'm glad you asked.
/^_begin./{s///;h;};
-- When we see _begin
, strip it off and store the rest of the line to sed's "hold buffer"./^_attrib1[^0-9]/{s///;H;x;s/\n/ /;x;};
-- When we see _attrib1
, strip it off, append it to the hold buffer, swap the hold buffer and pattern space, replace newlines with spaces, and swap the hold buffer and pattern space back again./^_end/{;g;p;}
-- We've reached the end, so pull the hold buffer into the pattern space and print it.This assumes that your input field separator is just a single tab.
SO simple. Who ever said sed
was arcane?!
This should work:
#!/bin/bash
awk 'BEGIN {FS="\t"} {if ($1=="_begin" || $1=="_attrib1") { output=output " " $2 }} END{print output}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With