Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sed to extract text between two strings

Tags:

regex

shell

sed

awk

Please help me in using sed. I have a file like below.

START=A
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=B
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=C
  xxxxx
  xxxxx
END
START=A
  xxxxx
  xxxxx
END
START=D
  xxxxx
  xxxxx
END

I want to get the text between START=A, END. I used the below query.

sed '/^START=A/, / ^END/!d' input_file

The problem here is , I am getting

START=A
  xxxxx
  xxxxx
END
START=D
  xxxxx
  xxxxx
END

instead of

START=A
  xxxxx
  xxxxx
END

Sed finds greedily.

Please help me in resolvng this.

Thanks in advance.

Can I use AWK for achieving above?

like image 315
ranganath111 Avatar asked May 20 '13 05:05

ranganath111


2 Answers

sed -n '/^START=A$/,/^END$/p' data

The -n option means don't print by default; then the script says 'do print between the line containing START=A and the next END.

You can also do it with awk:

A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second.

(from man awk on Mac OS X).

awk '/^START=A$/,/^END$/ { print }' data

Given a modified form of the data file in the question:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=B
  xxx07
  xxx08
END
START=A
  xxx09
  xxx10
END
START=C
  xxx11
  xxx12
END
START=A
  xxx13
  xxx14
END
START=D
  xxx15
  xxx16
END

The output using GNU sed or Mac OS X (BSD) sed, and using GNU awk or BSD awk, is the same:

START=A
  xxx01
  xxx02
END
START=A
  xxx03
  xxx04
END
START=A
  xxx05
  xxx06
END
START=A
  xxx09
  xxx10
END
START=A
  xxx13
  xxx14
END

Note how I modified the data file so it is easier to see where the various blocks of data printed came from in the file.

If you have a different output requirement (such as 'only the first block between START=A and END', or 'only the last ...'), then you need to articulate that more clearly in the question.

like image 125
Jonathan Leffler Avatar answered Oct 20 '22 01:10

Jonathan Leffler


Basic version ...

sed -n '/START=A/,/END/p' yourfile

More robust version...

sed -n '/^ *START=A *$/,/^ *END *$/p' yourfile
like image 26
xagyg Avatar answered Oct 20 '22 01:10

xagyg