I have a xml like below
<root>
<FIToFICstmrDrctDbt>
<GrpHdr>
<MsgId>A</MsgId>
<CreDtTm>2001-12-17T09:30:47</CreDtTm>
<NbOfTxs>0</NbOfTxs>
<TtlIntrBkSttlmAmt Ccy="EUR">0.0</TtlIntrBkSttlmAmt>
<IntrBkSttlmDt>1967-08-13</IntrBkSttlmDt>
<SttlmInf>
<SttlmMtd>CLRG</SttlmMtd>
<ClrSys>
<Prtry>xx</Prtry>
</ClrSys>
</SttlmInf>
<InstgAgt>
<FinInstnId>
<BIC>AAAAAAAAAAA</BIC>
</FinInstnId>
</InstgAgt>
</GrpHdr>
</FIToFICstmrDrctDbt>
</root>
I need to extract the value of each tag value in separate variables using awk command. how to do it?
If you notice awk 'print $1' prints first word of each line. If you use $3, it will print 3rd word of each line.
The awk implementation of cut uses the getopt() library function (see Processing Command-Line Options) and the join() library function (see Merging an Array into a String). The current POSIX version of cut has options to cut fields based on both bytes and characters.
For example: awk –F":" '{ print $3 }' file.dat. indicates that the given data file uses colon ( : ) characters to separate record fields. The –F option must come before the quoted program instructions. awk also allows you to define the value of variables on the command line by using the –v option.
The awk variables $1 or $2 through $nn represent the fields of each record and should not be confused with shell variables that use the same style of names. Inside an awk script $1 refers to field 1 of a record; $2 to field 2 of a record.
You can use awk
as shown below, however, this is NOT a robust solution and will fail if the xml is not formatted correctly e.g. if there are multiple elements on the same line.
$ dt=$(awk -F '[<>]' '/IntrBkSttlmDt/{print $3}' file)
$ echo $dt
1967-08-13
I suggest you use a proper xml processing tool, like xmllint
.
$ dt=$(xmllint --shell file <<< "cat //IntrBkSttlmDt/text()" | grep -v "^/ >")
$ echo $dt
1967-08-13
The following gawk command uses a record separator regex pattern to match the XML tags. Anything starting with a < followed by at least one non-> and terminated by a > is considered to be a tag. Gawk assigns each RS match into the RT variable. Anything between the tags will be parsed as the record text which gawk assigns to $0.
gawk 'BEGIN { RS="<[^>]+>" } { print RT, $0 }' myfile
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With