Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

awk split() function uses regular expression or exact string constant?

If we have ip=192.168.0.1 and we call split(ip, myArray, "."), myArray will contains "192" at position 1, "168" at position 2, "0" at position 3 and "1" at position 4.

My question is that why does awk not interpreted the "." as the "any character" regular expression?

What would I need to do if I want to make awk interpreted the "." as the "any character" regular expression for matching?

Will this behaviour be consistent across all awk implementations?

like image 575
Maytas Monsereenusorn Avatar asked Apr 07 '17 12:04

Maytas Monsereenusorn


1 Answers

This is really a dark corner of awk....

I had the same doubt about 5 years ago. I submitted as bug and talked to a developer of gawk, and finally got clear. It is a "feature".

Here is the ticket: https://lists.gnu.org/archive/html/bug-gawk/2013-03/msg00009.html

split(str, array, magic)

For magic:

  • when you use a non-empty string (quoted by "") "...", awk will check the length of the string, if it is single char, it will be used as literal string (they call it separator). However if it is longer than 1, it will be treated as a dynamic regex.

  • when you use static regex, which means, in format /.../, no matter how long is the expression, it will be always treated as regex.

That is:

"."  - literal "." (period)
"["  - literal "["
"{"  - literal "{"
".*" - regex
/./  - regex
/whatever/ -regex

If you want awk to treat .(period) as regex metacharacter, you should use split(foo,bar,/./) But if you split by any char, you may have empty arrays, if this is what you really want.

like image 50
Kent Avatar answered Oct 02 '22 03:10

Kent