Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk field separator with regexp lookahead or lookbehind

Tags:

regex

awk

I want to split line with escape sequence but failed. For example:

$ echo "1,2\,2,333"|awk -F "(?<\!\\,)," '{print $2}'   ## expecting "2\,2"
awk: warning: escape sequence `\!' treated as plain `!'
awk: warning: escape sequence `\,' treated as plain `,'

Does awk/gawk support field separator with regexp lookahead or lookbehind ?

like image 971
peihan Avatar asked May 25 '15 02:05

peihan


2 Answers

As I have said in comment, awk does not support look-ahead or look-behind, since it uses POSIX Extended Regular Expression (ERE). If you really need look-ahead or look-behind, you might want to use Perl instead. However, in this case, you can slightly change your approach to solve the problem.

If you data contains the delimiter, instead of splitting the data by looking for an unescaped delimiter (which can fail when there are many \ in a row), it's better to match the fields directly instead.

The regex to match the fields is /([^\\,]|\\.)+/. Do note that this regex is not aware of quoted fields. If you want to support them, it depends on how you deal with cases where the quotes are not closed properly, or there are more than one quote in a field. If you can assume that your data is well-formatted, then you can just come up with a regex that works for your data.

Here is something to get you started. The code below prints all the fields in a line.

echo "1,2\,2,333" | awk '{while (match($0, /([^\\,]|\\.)+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'

Reference

  • How to get match regex pattern using awk from file?
like image 107
nhahtdh Avatar answered Sep 20 '22 12:09

nhahtdh


One way to handle this is using FPAT (splitting by content) in gnu-awk:

awk 'BEGIN{ FPAT=",([^\\\\]*\\\\,)*[^,]*,|[^,]+" } {
  for (i=1; i<=NF; i++) {gsub(/^,|,$/, "", $i); printf "$%d: <%s>\n", i, $i}
}' <<< "1,2\,2,333"
$1: <1>
$2: <2\,2>
$3: <333>
like image 28
anubhava Avatar answered Sep 21 '22 12:09

anubhava