awk field separator with regexp lookahead or lookbehind

Question

I want to split line with escape sequence but failed. For example:

$ echo "1,2\,2,333"|awk -F "(?<\!\,)," '{print $2}'   ## expecting "2\,2"
awk: warning: escape sequence `\!' treated as plain `!'
awk: warning: escape sequence `\,' treated as plain `,'

Does awk/gawk support field separator with regexp lookahead or lookbehind ?

nhahtdh · Accepted Answer

As I have said in comment, awk does not support look-ahead or look-behind, since it uses POSIX Extended Regular Expression (ERE). If you really need look-ahead or look-behind, you might want to use Perl instead. However, in this case, you can slightly change your approach to solve the problem.

If you data contains the delimiter, instead of splitting the data by looking for an unescaped delimiter (which can fail when there are many \ in a row), it's better to match the fields directly instead.

The regex to match the fields is /([^\,]|\.)+/. Do note that this regex is not aware of quoted fields. If you want to support them, it depends on how you deal with cases where the quotes are not closed properly, or there are more than one quote in a field. If you can assume that your data is well-formatted, then you can just come up with a regex that works for your data.

Here is something to get you started. The code below prints all the fields in a line.

echo "1,2\,2,333" | awk '{while (match($0, /([^\,]|\.)+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'

Reference

How to get match regex pattern using awk from file?

anubhava · Answer

One way to handle this is using FPAT (splitting by content) in gnu-awk:

awk 'BEGIN{ FPAT=",([^\\]*\\,)*[^,]*,|[^,]+" } {
  for (i=1; i<=NF; i++) {gsub(/^,|,$/, "", $i); printf "$%d: <%s>
", i, $i}
}' <<< "1,2\,2,333"
$1: <1>
$2: <2\,2>
$3: <333>

awk field separator with regexp lookahead or lookbehind

Tags:

regex

awk

peihan

2 Answers

Reference

nhahtdh

anubhava

Recent Activity

Donate For Us

awk field separator with regexp lookahead or lookbehind

Tags:

regex

awk

peihan

2 Answers

Reference

nhahtdh

anubhava

Related questions

Recent Activity

Donate For Us