Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract text between two words in unix?

I
am
using
basic
sed
expression :-

sed -n "am/,/sed/p" 

to get the text between "am" and "sed" which will output "am \n using \n basic \n sed". But my real problem is if the string would be :-

I
am
using
basic
grep
expression.

I applied the above sed in this sentence then it gave "am \n using \n basic \n grep \n expression" which it should not give it. How to discard the output if there would be no matching?

Any suggestions?

like image 798
crazy_prog Avatar asked May 25 '11 16:05

crazy_prog


People also ask

What is the difference between sed and grep?

The sed command is a stream editor that works on streams of characters. It's a more powerful tool than grep as it offers more options for text processing purposes, including the substitute command, which sed is most commonly known for.

How do I substring in bash?

Using the cut Command Specifying the character index isn't the only way to extract a substring. You can also use the -d and -f flags to extract a string by specifying characters to split on. The -d flag lets you specify the delimiter to split on while -f lets you choose which substring of the split to choose.

How do I grep a line in Linux?

The grep command syntax is simply grep followed by any arguments, then the string we wish to search for and then finally the location in which to search. 1. Search test1 for the string steve using grep. The search criteria is case sensitive so ensure that you're searching correctly.


3 Answers

The command in the question (sed -n "/am/,/sed/p", note the added slash) means:

  • Find a line containing the string am
  • and print (p) until a line containing sed occurs

Therefore it prints:

I am using basic grep expression

because it contains am. If you would add some more lines they will be printed, too, until a line containing sed occurs.

E.g.:

echo -e 'I am using basic grep expression.\nOne more line\nOne with sed\nOne without' | sed -n "/am/,/sed/p"

results in:

I am using basic grep expression.
One more line
One with sed

I think - what you want to do is something like that:

sed -n "s/.*\(am.*sed\).*/\1/p"

Example:

echo 'I am using basic grep expression.' | sed -n "s/.*\(am.*sed\).*/\1/p"

echo 'I am using basic sed expression.' | sed -n "s/.*\(am.*sed\).*/\1/p"
sed -n "s/.*\(am.*sed\).*/\1/p"
like image 144
bmk Avatar answered Sep 29 '22 04:09

bmk


You have to use slightly different sed command like:

sed -n '/am/{:a; /am/x; $!N; /sed/!{$!ba;}; /sed/{s/\n/ /gp;}}' file

To print ONLY lines that contain text am and sed spanned across multiple lines.

like image 31
anubhava Avatar answered Sep 29 '22 04:09

anubhava


When Using SED this can work but it's quite an overwhelming syntax... if you need to crop part of a multi-line (\n) text, you might want to try a simpler way using grep:

cat multi_line.txt | grep -oP '(?s)(?<=START phrase).*(?=END phrase)'

For example, I find this as the easiest way to grab perforce changelist description (without rest of CL info):

p4 describe {CL NUMBER} | grep -oP '(?s).*(?=Affected files)'

Note, you can play with the <= and >= to include or not include, the starting/ending phrases in the output.

like image 45
3 revs Avatar answered Sep 29 '22 04:09

3 revs