Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I put "\s|=" as a field separator in awk?

Tags:

bash

shell

awk

This works:

awk  -F"[[:space:]]|=" '/^[^#]/{print($2)}'   /etc/fstab 

But this doesn't:

awk  -F"\s|=" '/^[^#]/{print($2)}'   /etc/fstab 

I'm using the awk coming with Ubuntu 16.04.

like image 878
e-satis Avatar asked Aug 31 '25 21:08

e-satis


1 Answers

Welcome to the nightmare of bash-escape sequences and using string constants as regex expressions in awk. You defined a double-quoted string that is used as a regex in awk ( -F"\s|=").

How awk processes regex:

First of all, you need to understand that there are two ways to write a regex in awk :

  • you enclose it in slashes /ere/
  • you enclose it between quotes (such as is done with FS)

The latter, however, implies that your string will be parsed twice: the first time when awk reads your program, and the second time when it goes to match the string on the lefthand side of the operator with the pattern on the right (See GNU awk manual).

Thus the expressions /\s|=/ and "\\s|=" are equivalent regex, while /s|=/ and "\s|=" are equivalent.

How bash handles the \:

Bash uses the \ character for escaping characters. A non-quoted backslash (\) preserves the literal value of the next character that follows (with few exceptions). A single-quoted backslash has no special meaning while a double-quoted backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or <newline>.

This gives us now the following options :

  • -F"\s|=": awk receives the string expression "\s|=" which is parsed as the regex /s|=/
  • -F"\\s|=": bash escapes the second \ and awk receives the string expression "\s|=" which is parsed as the regex /s|=/
  • -F"\\\s|=": bash escpaes the second \ and awk receives the string expression "\\s|=" which is parsed as the regex /\s|=/
  • -F"\\\\s|=": bash escapes the second and fourth \ and awk receives the string expression "\\s|=" which is parsed as the regex /\s|=/

So having this said, the following are all equivalent:

$ awk -F '\\s|=' '/^[^#]/{print $2}' /etc/fstab
$ awk -F "\\\s|=" '/^[^#]/{print $2}' /etc/fstab
$ awk -F "\\\\s|=" '/^[^#]/{print $2}' /etc/fstab
$ awk 'BEGIN{FS="\\s"}/^[^#]/{print $2}' /etc/fstab
$ awk 'BEGIN{FS="\\s"}/^[^#]/{print $2}' /etc/fstab
$ awk "BEGIN{FS=\"\\\\s|=\"}/^[^#]/{print \$2}" /etc/fstab

There are three quoting mechanisms: the escape character, single quotes, and double quotes.

  • A non-quoted backslash (\) is the escape character. It preserves the literal value of the next character that follows, with the exception of <newline>. If a \<newline> pair appears, and the backslash is not itself quoted, the \<newline> is treated as a line continuation (that is, it is removed from the input stream and effectively ignored).

  • Enclosing characters in single quotes preserve the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.

  • Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !. The characters $ and ` retain their special meaning within double quotes. The backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or <newline>. A double quote may be quoted within double quotes by preceding it with a backslash. If enabled, history expansion will be performed unless an ! appearing in double quotes is escaped using a backslash. The backslash preceding the ! is not removed.

source man bash: section QUOTING

like image 103
kvantour Avatar answered Sep 03 '25 13:09

kvantour