Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting string between 2 strings with bash shell script

I've seen questions similar to this, but none of the solutions seem to work in this case. I have a text file that looks something like this

START-OF-FILE
RUNDATE=20140910
FIRMNAME=dl
FILETYPE=pc
REPLYFILENAME=TEST
DERIVED=yes
PROGRAMFLAG=oneshot
SECID=ISIN
SECMASTER=yes
PROGRAMNAME=getdata
START-OF-FIELDS
ISSUER
START-OF-DATA
US345370CN85|0|4|FORD MOTOR COMPANY|FORD MOTOR COMPANY| | |
US31679BAC46|0|4|FIFTH STREET FINANCE COR|FIFTH STREET FINANCE COR| | |
END-OF-DATA
END-OF-FILE

I'm trying to write a bash shell script to extract only the text between "START-OF-DATA" and "END-OF-DATA" excluding both of these. So output i'm looking for would look like this

US345370CN85|0|4|FORD MOTOR COMPANY|FORD MOTOR COMPANY| | |
US31679BAC46|0|4|FIFTH STREET FINANCE COR|FIFTH STREET FINANCE COR| | |

The code i've written so far looks like this

while read line
do
    name=$line

    echo $name | sed -e 's/START-OF-DATA\(.*\)END-OF-DATA/\1/'

done < $1

and running it from bash like

./script.sh file.txt

where script.sh is what I have saved the shell script as and file.txt is the text file above that it reads. At the moment it just reads and echoes the entire file. I'm guessing its something silly in my syntax. Any pointers in the right direction would be much appreciated. Thanks

like image 456
tasslebear Avatar asked Sep 11 '14 11:09

tasslebear


People also ask

How do you slice a string in bash?

Using the cut Command Specifying the character index isn't the only way to extract a substring. You can also use the -d and -f flags to extract a string by specifying characters to split on. The -d flag lets you specify the delimiter to split on while -f lets you choose which substring of the split to choose.


2 Answers

Using awk you can do:

awk '/START-OF-DATA/{p=1;next} /END-OF-DATA/{p=0;exit} p' file
US345370CN85|0|4|FORD MOTOR COMPANY|FORD MOTOR COMPANY| | |
US31679BAC46|0|4|FIFTH STREET FINANCE COR|FIFTH STREET FINANCE COR| | |

Or using sed:

sed -n '/START-OF-DATA/,/END-OF-DATA/{/START-OF-DATA\|END-OF-DATA/!p;}' file
US345370CN85|0|4|FORD MOTOR COMPANY|FORD MOTOR COMPANY| | |
US31679BAC46|0|4|FIFTH STREET FINANCE COR|FIFTH STREET FINANCE COR| | |
like image 195
anubhava Avatar answered Oct 06 '22 00:10

anubhava


In order to make your solution work you could make a marker when you hit "START-OF-DATA" that reads "True" (or similar), and then end it when you hit "END-OF-DATA". Using this marker you could tell echo to print when the marker reads "True" (when you are inside the relevant block of text).

...or you could use sed:

sed -n '/START-OF-DATA/,/END-OF-DATA/ { //!p }' file.txt
like image 24
bryn Avatar answered Oct 05 '22 23:10

bryn