Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sed regex to match multiple fields and values, including quotes

I have a (space-separated) input file with lines such as:

field1=value1 field2="value 2" field3='value 3' field4="value '4'" ...

The number of fields varies depending of the line. In order to process properly such file, I would ideally like to sed it and obtain some tabulated-separated output such as:

field1 (tab) value1 (tab) field2 (tab) value 2 (tab) field3 (tab) value 3 (tab) field4 (tab) value '4'

The furthest I have been so far is with something such as sed "s/\([a-z][a-z]*\)=\(['\"]\{0,1\}\)\(..*?\)\2/\t\1\t\3/g" but way too far from solving my problem. My difficulty is to handle properly the absence or presence of delimiters (quotes) to the values. For the sake of elegance (or geekness), I am sticking to sed, but would also consider an awk alternative.

Thanks in advance for any help,

Edit: I am shocked to say, but @Jotne is right.

echo "field1=value1 field2=\"value 2\" field3='value 3' field4=\"value '4'\"" | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"

does not work: field1=value1 field2="value 2" field3='value 3' field4="value '4'"`

Though the following (the idea behind is to parse an audit.log file) works:

root@XXX:~# tail -n 2 /var/log/audit/audit.log 
type=CRED_DISP msg=audit(1570385821.075:670): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:setcred acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
type=USER_END msg=audit(1570385821.075:671): pid=32605 uid=0 auid=0 ses=399 msg='op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
root@XXX:~# tail -n 2 /var/log/audit/audit.log | sed "s/\([a-z][a-z]*\)=\(\([^ ][^ ]*\)\|'\([^'][^']*\)'\|\"\([^\"][^\"]*\)\"\)/\1\t\3\4\5\t/g"
type    CRED_DISP    msg    audit(1570385821.075:670):   pid    32605    uid    0    auid   0    ses    399  msg    op=PAM:setcred acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success 
type    USER_END     msg    audit(1570385821.075:671):   pid    32605    uid    0    auid   0    ses    399  msg    op=PAM:session_close acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success   

Why?

like image 911
Maguy IB Avatar asked Nov 07 '22 13:11

Maguy IB


1 Answers

This might work for you (GNU sed):

sed -E 's/ \<([^ =]+)=("[^"]*"|'\''[^'\'']*'\'')/\t\1\t\2/g;s/=/\t/' file

The first substitution replaces all ='s and spaced fields except for the first field. The second substitution rectifies the first.

like image 132
potong Avatar answered Nov 12 '22 17:11

potong