Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash extract segments of a string and store in variables

Tags:

linux

bash

awk

I want to convert the output from cppclean into cppcheck-like xml sections, such that:

./bit_limits.cpp:25: static data 'bit_limits::max_name_length'

becomes:

<error id="static data" msg="bit_limits::max_name_length">
    <location file="./bit_limits.cpp" line="25"/>
</error>

I started with some awk:

test code:

echo "./bit_limits.cpp:25: static data 'bit_limits::max_name_length'" > test
cat test.out | awk -F ":" '{print "<error id=\""$3"\""}
                           {print "msg=\""}{for(i=4;i<=NF;++i)print ":"$i}{print "\">"}
                           {print "<location file=\""$1"\" line=\""$2"\"/>"}
                           {print "</error>"}'

Note: to run this you need to put the cat command back into one line - I printed it over multi-lines for ease of reading.

Explanation: I am using awk and delimiting by colon ":" - which splits the line into useful chunks which I try to construct into the XML:

  • {print "<error id=\""$3"\""} - Extract the error ID part
  • {print "msg=\""}{for(i=4;i<=NF;++i)print ":"$i}{print "\">"} - extract the message (replacing the missing colons, this is all the remaining sections
  • {print "<location file=\""$1"\" line=\""$2"\"/>"} - extract the file and line, this part is easy since the colons line up nicely
  • {print "</error>"} - finally print the end tag

This is close, but not quite right, it produces:

<error id=" static data 'bit_limits"
msg="
:
:max_name_length'
">
<location file="./bit_limits.cpp" line="25"/>
</error>

The id field should just be "static data" and the msg field should be "'bit_limits::max_name_length'", but other then that it is ok (I don't mind it being split of multi-lines at the moment - though I would prefer that awk did not print a new line each time.

Update As @charlesduffy pointed out - for context - I want to do this in bash because I want to embed this code into a makefile (or just a normal bash script) for maximum portability (i.e. no need for python or other tools).

like image 423
code_fodder Avatar asked Mar 04 '23 09:03

code_fodder


1 Answers

With bash and a regex:

x="./bit_limits.cpp:25: static data 'bit_limits::max_name_length'"
[[ $x =~ (.+):([0-9]+):\ (.+)\ \'(.+)\' ]]

declare -p BASH_REMATCH

Output:

declare -ar BASH_REMATCH='([0]="./bit_limits.cpp:25: static data '\''bit_limits::max_name_length'\''" [1]="./bit_limits.cpp" [2]="25" [3]="static data" [4]="bit_limits::max_name_length")'

The elements 1 to 4 in array BASH_REMATCH contain the searched strings.

From man bash:

BASH_REMATCH: An array variable whose members are assigned by the =~ binary operator to the [[ conditional command. The element with index 0 is the portion of the string matching the entire regular expression. The element with index n is the portion of the string matching the nth parenthesized subexpression. This variable is read-only.

like image 56
Cyrus Avatar answered Mar 15 '23 17:03

Cyrus