I want to split line with escape sequence but failed. For example:
$ echo "1,2\,2,333"|awk -F "(?<\!\\,)," '{print $2}' ## expecting "2\,2"
awk: warning: escape sequence `\!' treated as plain `!'
awk: warning: escape sequence `\,' treated as plain `,'
Does awk/gawk support field separator with regexp lookahead or lookbehind ?
As I have said in comment, awk does not support look-ahead or look-behind, since it uses POSIX Extended Regular Expression (ERE). If you really need look-ahead or look-behind, you might want to use Perl instead. However, in this case, you can slightly change your approach to solve the problem.
If you data contains the delimiter, instead of splitting the data by looking for an unescaped delimiter (which can fail when there are many \
in a row), it's better to match the fields directly instead.
The regex to match the fields is /([^\\,]|\\.)+/
. Do note that this regex is not aware of quoted fields. If you want to support them, it depends on how you deal with cases where the quotes are not closed properly, or there are more than one quote in a field. If you can assume that your data is well-formatted, then you can just come up with a regex that works for your data.
Here is something to get you started. The code below prints all the fields in a line.
echo "1,2\,2,333" | awk '{while (match($0, /([^\\,]|\\.)+/)) {print substr($0, RSTART, RLENGTH);$0=substr($0, RSTART+RLENGTH)}}'
One way to handle this is using FPAT
(splitting by content) in gnu-awk:
awk 'BEGIN{ FPAT=",([^\\\\]*\\\\,)*[^,]*,|[^,]+" } {
for (i=1; i<=NF; i++) {gsub(/^,|,$/, "", $i); printf "$%d: <%s>\n", i, $i}
}' <<< "1,2\,2,333"
$1: <1>
$2: <2\,2>
$3: <333>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With